Article: The SML file format: Alternatives to SML

There are a number of popular alternatives to the SML format used in this framework. The following section describes some of these alternatives, compares and discusses the results.

Since it can be predicted, that some users may want to edit configuration files by hand, the readability of their source code is crutial. The syntax should be intuitive. This is particularly important, since at the present time a graphical interface can only be provided for a subset of the configuration settings of the framework. Given the wealth of options, it seems unlikely that a full coverage can be achieved already in the near future. Also it is questionable, wether this is desirable. Finally, it should also be taken into account that in principle experienced users may decide deliberately and intentionally to ignore the functions of a "cumbersome" GUI, because they feel like it is slowing them down.

An interesting factor is the performance, because the task to initializate the framework may be time critical. As the files are read on each call of the framework, the process of reading must be efficient. While it does not always make sense to try to compare the "speed" of different implementations, it should not be avoided where it is possible.

Also it might be usefull to take a look at the limitations of a file format. If there are K.O. factors for a certain format under certain precondition, this should be mentioned.

XML parser

One of the powers of XML is it's portability to other programming languages. Parser for XML are integrated in nearly all modern dialects. However, this is no extra value when the data meant to initialize a framework, which is only available in a special programming language.

The current tool of choice for handling XML files, which is available in both versions 4 and 5 of PHP is the native XML parser. However, this XML parser has a weakness. It may only read, but not write, XML files. A standard component for writing XML files is available with PHP version 5.

Also it is to be mentioned that to get an array from a XML file using the XML parser wouldn't require less effort than it was to write the whole script to read SML files. It also comes with the risk that the implementation of the XML parser might be subject to change in a later version of the language (as any other third-party implementation) even if this is unlikely to happen. In addition there are no real advantages of this implementation, other than the fact, that XML is a popular dialect and widely accepted.

Let us shed some light on the performance. However, this is anything but straightforward and all results might as well be subject to discussions, as they are dependent on the tool used. There are many of the them for the XML format, even if you restrict your search to PHP.

In the following scenario, a simple XML document is to be loaded and put into a data field.

<?xml version="1.0"?>
<root>
     <array name="channel">
             <scalar name="title">Test</scalar>
             <scalar name="link">about:blank</scalar>
             <scalar name="description" />
             <scalar name="language">de-de</scalar>
             <array name="0">
                     <scalar name="title">test item 1</scalar>
                     <scalar name="pubDate">Do, 9 Jun 2005 13:23:18 +0200</scalar>
                     <scalar name="description"><![CDATA[test item]]></scalar>
                     <scalar name="link" />
                     <scalar name="author">test author</scalar>
             </array>
             <array name="1">
                     <scalar name="title">test item 2</scalar>
                     <scalar name="pubDate">Do, 9 Jun 2005 13:23:53 +0200</scalar>
                     <scalar name="description"><![CDATA[test item]]></scalar>
                     <scalar name="link" />
                     <scalar name="author">test author</scalar>
             </array>
             <array name="2">
                     <scalar name="title">test item 3</scalar>
                     <scalar name="pubDate">Do, 9 Jun 2005 13:24:18 +0200</scalar>
                     <scalar name="description"><![CDATA[test item]]></scalar>
                     <scalar name="link" />
                     <scalar name="author">test author</scalar>
             </array>
             <array name="3">
                     <scalar name="title">Linktest</scalar>
                     <scalar name="pubDate">Do, 9 Jun 2005 13:24:36 +0200</scalar>
                     <scalar name="description"><![CDATA[test item]]></scalar>
                     <scalar name="link">about:blank</scalar>
                     <scalar name="author">test author</scalar>
             </array>
     </array>
</root>

<channel>
             <title>Test</title>
             <link>about:blank</link>
             <language>de-de</language>
             <0>
                     <title>test item 1</title>
                     <pubDate>Do, 9 Jun 2005 13:23:18 +0200</pubDate>
                     <description>test item</description>
                     <author>test author</author>
             </0>
             <1>
                     <title>test item 2</title>
                     <pubDate>Do, 9 Jun 2005 13:23:53 +0200</pubDate>
                     <description>test item</description>
                     <author>test author</author>
             </1>
             <2>
                     <title>test item 3</title>
                     <pubDate>Do, 9 Jun 2005 13:24:18 +0200</pubDate>
                     <description>test item</description>
                     <author>test author</author>
             </2>
             <3>
                     <title>Linktest</title>
                     <pubDate>Do, 9 Jun 2005 13:24:36 +0200</pubDate>
                     <description>test item</description>
                     <link>about:blank</link>
                     <author>test author</author>
             </3>
</channel> <Se

For comparison, PHP 4 with the standard XML parser will be used. Before and after each parser step the command "microtime" will be used to measure the time. The difference between these values will be given in milliseconds. It is not important to see how many milliseconds on implementation is faster than the other. Obviously you can't judge this by this kind of testing, since the procedure is far too rudimentary to produce representative results.

The configuration of the test system can be neglected in this case, since only the magnitude of the difference between the two techniques should be considered.

It's even more problematic, that hand-write PHP-Code is needed to have the XML parser handle the tags. Therefor it might be possible that less performant PHP code could have an impact on the results. To avoid this the XML parser will be measured two times: first with and second without the implementation.

This test does not aim to vote pro or contra the use of this XML parser. The aim of the tests is therefore not a vote for or against the use of the XML parser. Much more it should show whether there are clear differences between the two versions. In particular, whether the SML variant may not be suitable for the framework, because of poor perfomance.

For this modest task this simple test should provide sufficiently accurate results.

Considering the above short excerpt of the results, it comes clear that the XML parser with the PHP-code included is clearly less performant than the SML script. As already mentioned, this might also be due to less efficient programming. But when considering column 2 ( "XML parser 2") compared with column 3 ( "SML script"), you can see that even in this case the SML variant comes close to the values of the XML parser. This is particularly remarkable because the XML parser in this case ran empty and produced no results at all, since no implementation was given to handle the content. Therefore, the argument that an unfavourable implementation might be slowing down the XML parser, can no longer be maintained.

This test provided no indication that the efficiency of the implementation might have a negative effect on the performance of the entire application.

SimpleXML

The use of SimpleXML only slightly differs from that of SML, as the following source code indicates.

XML-Parser ¹	XML-Parser ²	SML-Script	Difference SML to XML ¹	Difference SML to XML ²
0,003685s	0,001813s	0,001626s	-0,001153s (31%)	-0,000187s (10%)
0,003410s	0,001666s	0,001255s	-0,002155s (63%)	-0,000411s (24%)
0,003367s	0,001629s	0,001312s	-0,002055s (61%)	-0,000317s (19%)
0,003394s	0,001614s	0,001240s	-0,002154s (63%)	-0,000374s (23%)
0,003381s	0,001618s	0,001228s	-0,002153s (63%)	-0,000390s (24%)
0,003386s	0,001613s	0,001400s	-0,001986s (59%)	-0,000213s (13%)
0,003387s	0,001620s	0,001296s	-0,002091s (61%)	-0,000324s (20%)
0,003658s	0,001617s	0,001247s	-0,002411s (65%)	-0,000370s (22%)
0,003678s	0,001633s	0,001454s	-0,002224s (60%)	-0,000179s (10%)
0,003725s	0,001613s	0,001573s	-0,002152s (58%)	-0,000040s (2%)

The only obvious difference is the type of the return value. While the SML variant returns an array as result, SimpleXML returns an object of type "SimpleXMLElement". One might argue which solution is better.

One argument against SimpleXML is that it uses object properties in PHP to represent names of tags and attributes. This is problematic because these may contain different characters in PHP and XML. For example, this concerns the character '-', which may not be included in the name of a variable in PHP. Consequently, at the time of preparing this document, it was very difficult to acces a tag like <foo-bar> via SimpleXML.

Initialization files

Initialization files, often abbreviated with the file extension "ini", are themselves a suitable alternative, unless mapping of complex structures is necessary. For example, the presentation of keys, especially in deeply nested structures, may be confusing. The format provides simple syntax. But the format is not as flexible and powerfull as XML. For example the typical tag-notation, which is currently very popular, is missing. Creating a DTD or a similar file, which describes the structure of the file format is not possible, for practical and theoretical reasons. Still this is no disadvantage. Considering that the primary task of this file format is to store any structured associative data fields taken from a script or programming language, it is clear that demanding a static structure doesn't make sense in this scenario (so does the DTD). A combination, which unites the advantages of both formats in itself, would be ideal.

[STORE\0]
type=book
author=Dr. E Xample
[STORE\0\TITLE]
main=Proto matter
subtitle=Alpha et Omega
[STORE\1]
type=cd
author=Barbara Singer
[STORE\1\TITLE]
main=Best-Of

JSON

"STORE" : {
    "0" : {
        "type" : "book",
        "author" : "Dr. E Xample",
        "TITLE" : {
            "main" : "Proto matter",
            "subtitle" : "Alpha et Omega"
        }
    },
    "1" : {
        "type" : "cd",
        "author" : "Barbara Singer",
        "TITLE" : {
            "main" : "Best-Of"
        }
    }
}

Using "serialize" and "unserialize"

PHP offers the possibility, like Java, to serialize and restore variables by using the commands "serialize" and "unserialize".

a:1:{s:7:"CHANNEL";a:7:{s:5:"TITLE";s:4:"Test";s:4:"LINK";s:11:"about
:blank";s:8:"LANGUAGE";s:5:"en-us";i:0;a:4:{s:5:"TITLE";s:10:"test i
tem 1";s:7:"PUBDATE";s:29:"Th, 9 Jun 2005 13:23:18 +0200";s:11:"DESC
RIPTION";s:8:"Test item";s:6:"AUTHOR";s:11:"Test author";}i:1;a:4:{s:
5:"TITLE";s:11:"Test item 2";s:7:"PUBDATE";s:29:"Th, 9 Jun 2005 13:
23:53 +0200";s:11:"DESCRIPTION";s:9:"Test item";s:6:"AUTHOR";s:11:"
Test author";}i:2;a:4:{s:5:"TITLE";s:11:"Test item 3";s:7:"PUBDATE";s
:29:"Th, 9 Jun 2005 13:24:18 +0200";s:11:"DESCRIPTION";s:8:"Test item
";s:6:"AUTHOR";s:11:"Test author";}i:3;a:5:{s:5:"TITLE";s:8:"Linktest
";s:7:"PUBDATE";s:29:"Th, 9 Jun 2005 13:24:36 +0200";s:11:"DESCRIPTIO
N";s:9:"Test item";s:4:"LINK";s:11:"about:blank";s:6:"AUTHOR";s:11:
"Test author";}}}

This solution is, without doubt, very performant. The command "unserialize" works app. 2 times faster than SimpleXML. However, the readability of this data stream is problematic. Obviously it is not a good solution for editing the file by hand. As good readability is essential for the Framework, this solutions was not chosen.

Alternatives to SML

XML parser

SimpleXML

Initialization files

JSON

Using "serialize" and "unserialize"

for further reading