<abstract>rgb replies to icon</abstract> On Fri, 30 May 2003, Konstantin Riabitsev wrote: > 1. Visual simplicity > This is the top and foremost reason to use the "win.ini" format. E.g.: > > [mysection] > foo=bar > baz=quux > > This is very easy to grok visually, because it limits the formatting > symbols to just a few. You can see clearly the section name, the > name of the variable, and the value that is assigned to it. The same > data in XML would look like so: > > <config> > <section id="mysection"> > <set> > <variable>foo</variable> > <value>bar</value> > </set> > <set> > <variable>baz</variable> > <value>quux</value> > </set> > </section> > </config> > > It is very hard to quickly make out the following information: > a) Name of the section > b) Which one is the variable name > c) Which one is the value This isn't entirely fair. Or at least, I'd never design the interface that way. The proper comparison would be: <config> <section id="mysection"> <foo>bar</foo> <baz>quux</baz> </section> </config> which is the way at least I would encode it (so each variable has a unique xpath within the hierarchy), possibly with attributes such as: <foo type='text'>bar</foo> if <foo> could hold either integer or string text. I'd say this is just exactly as easy to read as win.ini, and just about exactly as easy to maintain. It >>is<< longer, and in this trivial example appears to be overkill (and in this trivial example may be:-). Even this example reveals immediate weaknesses in the foo=bar format. How do you specify that foo=1234 is supposed to be a string, an int, or a float (in the possibly unlikely event that you have a config variable that could be any of the above)? This isn't completely crazed as an example -- suppose Seth decides to make debug=int into debug=["quite","loud","incredibly noisy","deafening"] to improve human readability because the number "2" isn't terribly illuminating? If it were XML he could parse the default type to be integer but add an attribute allowing it to be specified as text. He could even make debug level an ATTRIBUTE of the <repository> tag as in <repository id="dulug" debug="2">... and only have to endure lots of tedious debuggery from perhaps the only repository that is producing a problem. And then there are the cases where foo=bar, during the evolution of the program, becomes foo=bar,bari,an,horde or (gah) foo=bar(bari,an,horde) (which from the look of things in e.g. /etc/exports, /etc/fstab, elsewhere is an incredibly common occurrence in config files or other tables of all sorts). XML permits this emerging new hierarchy in one of several reasonably sane ways: <foo id='1'>bar</foo> <foo id='2'>bari</foo> <foo id='3'>an</foo> <foo id='4'>horde</foo> if it is really a foo vector, <foo> <init>bar<init> <font>bari<font> <control>an<control> <exit>horde<exit> </foo> in the event that foo suddenly acquired a set of controllable parameters, each with its own name (a name that might be SHARED with some other parameter later). Finally, one can even continue to do: <foo>bar,bari,an,horde</foo> and parse them out if one wishes to obfuscate users and absolutely require the use of an associated manual and have to specify all four in order to override the default only for the third one. xmlpath uniqueness more or less imposes a certain degree of order and sanity. XML used this way encourages the creation of autodocumenting interfaces. Quick, without looking at source, what are the meanings of the four fields for cpu in /proc/stat? Hell, I've WRITTEN source parsing them and using them to generate stats for presentation, and I can't remember without looking (which may just reflect my general stupidity) but if they were wrapped the way they are in an xmlsysd return: <cpu id="0"> <user>1025024</user> <nice>5541</nice> <sys>328659</sys> <idle>201026019</idle> </cpu> I don't have to. I would argue that any human could read the output from xmlsysd and have a very good idea at a glance what the state of the system is WITHOUT a UI of any sort. Those fields where that isn't true (e.g. <shmbufs>) it is likely to be more because of ignorance of what shared memory is and what the fields do operationally in the first place than because one cannot figure out what the fields are called or their place in the stats hierarchy. Compared to raw /proc itself it is a limpid pool, if still only halfway to "cooked" values in many cases. So, in general, I'd say that if xml tags are well chosen it is (exactly!) as readable as win.ini even for simple files, but agree that its real advantage comes forth when the file grows more complex or needs to "suddenly" contain entries your parser isn't equipped to handle or that require a second stage of parsing (e.g. csv, wssv in a vector). The trouble is, it is difficult at the beginning of a project to know just when that will occur. At the beginning, sure, xml will seem like overkill -- nuking fleas -- when all the project will EVER need is two or three variables. Of course, in this case the variables should probably be set via command line options anyway, configuration files themselves may be overkill. Alas, simple tools grow complex over time -- I'll bet even ls was simple once upon a time. So your points are well taken, but I honestly think that the moment to think xml is right when you make the first [toplevel] win.ini category OR your first set of kludgily named related variables OR (worse) a vector of values parsed in a second stage. You've just introduced a hierarchical configurational model, like it or not, in all of these cases, and are at the break even point where any additional complexity in win.ini will become obfuscated. Only if you are certain that you will NEVER need more than what amounts to a hash table (hash,value) to get all your values in is win.ini not likely to be outgrown. Your other obserservation (regarding comments) is well taken. XML files have to be a lot more disciplined than files you parse out yourself. Still, there is a reason for that. You may or may not agree with the reason, but it is there and the discipline it imposes may in the long run pay back with the improved data design it forces on you like it or not. XML is a wee bit fascist compared to e.g. html that doesn't care if you close tags and so forth... > Keeping these points in mind, I would actually advocate picking a > simple/sane XML file format and going with it. XML exists for > representing complex data structures with the help of human-editable > Markup Language, which may in the future require eXtending, so it > seems to me that it is actually the best choice for the job. > > Writing yet-another-config-file, e.g. using %silly delimiters might > quickly backfire if only for the same reasons why I can no longer > write "cleaned up %post" in the spec-file changelog entries. ;) Exactly. Spec being yet another example of something where it probably seemed like, and even was, a perfectly good idea back around 1.0 or even 2.0 -- but then things got complex, fields were added, fields of different TYPE were added, a specialized parser with its own rules were added, and damn -- now it requires a huge amount of documentation because it isn't clear that those rules were all chosen hierarchically with open/close delimiters. XML is a pain, no doubt, but if you use it when you don't really have to for simple stuff, it will save you a lot more pain later when you want to add a <post> run install_everything </post> <comment date="05/30/03 or whatever you like">"This is a comment about stuff in the <post> section above"</comment> But <sigh>spec ain't xml (yet)</sigh> rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@xxxxxxxxxxxx