On 29 May 2003, seth vidal wrote: > I'm not sure where I come down on this. > > other comments from the peanut-gallery? :) XML! XML! Rah! Rah! (predictably enough:-) Seriously, a well designed set of tags is just as easy to human edit as a well designed plaintext config file -- maybe even EASIER as hierarchical dependence is enforced by the open/close tag rules. It is also very easy to read if it is properly indented (much like html or sgml, which people write all the time). The only tricky part is to make sure that you write your documentation at the same time, so that you have a clean description in human english of what each tag delimits and which tags are subtags. Something like: <?xml version="1.0"?> <yum version="2.0.0"> <cachedir>/var/cache/yum</cachedir> <debuglevel>2</debuglevel> <logfile>/var/log/yum.log</logfile> <distroverpkg>Frog Linux</destroverpkg> <releasever>$releasever</realeasever> <basearch>$basearch</basearch> <repository id="base"> <name>Frog Base</name> <url>http://mirror.dulug.duke.edu/pub/yum-repository/redhat/$releasever/$basearch/</url> </repository> <repository id="updates"> <name>Frog Updates</name> <url>http://mirror.dulug.duke.edu/pub/yum-repository/redhat/updates/$releasever/$basearch/</url> </repository> </yum> works, for example. Or use <a href="http://mirror.dulug.duke.edu/etc...">Frog Base</a> for a more recognizable/portable url format. As to why, I think I've gone over this several times in other forums (fori? forae? damn, can't remember my latin) but for the record in this one, in order of importance: a) If XMLish becomes a universal standard for >>all<< linux configuration, and if people develop the habit of >>always<< documenting the XMLish API as they write it, many, many good things will come of it. Including: b) Consistency. Configurational grouping is now haphazard. Some configuration files (like yum's) use a variable-assignment format. However, the same variable names are reused. To differentiate and delineate them, named blocks are used. However, the user does not generally know the named block "rules". Is it the empty line between blocks that counts? If they indent blocks will it matter? What about whitespace and tabs, Do they need to put spaces to the right of the = sign in quotes? For that matter, do variables have implicit types (so version=1a will barf, where version=1 will not)? XML spells out most of that (no,no,only inside "strings",no = sign but no,no but this is still a design decision on the parsing side and must be documented). Line continuation, termination, nesting rules, all consistent and unambiguous. c) Extensibility. The way xml is parsed, extra tags will generally be ignored. Adding features (new tags) to a config file will thus often not break older versions of the software, as long as one doesn't rearrange or change the meaning of existing tags. Also, sooner or later in a complex project one needs to add e.g. a vector of values. In once sense, yum already does this -- a vector of repostitory description blocks, but one may need e.g. a vector of times the repository will be open to a particular client (imagining an unlikely time when load is so great that clients have to be allocated particular time block(s) on a round-robin basis throughout the day to defer their requests to). XML provides a consistent way of doing this that is neither time[0]=x time[1]=y nor time="x,y,z..." nor time=x y z (tab-delimited, Seth's favorite:-) nor any of the other bastardized ways people have decided to solve a problem like this over the years. Another point (as taught by friend Icon) is that many variables require "modifiers" or "labels"; xml has the facility for tags to have their internal <tag parm1="very" parm2="cool">parameters</tag>. Again, in a variable=value paradigm what can you do? tag=parameters tagparm1=very tagparm2=cool is ugly, and other solutions are even worse. c) XMLish conf files, with indendation and comments, are actually remarkably easy to human edit. I do it all the time. I >>prefer<< to do it. Even without tag documentation, I at least know the significant elements of xml and thus know that I can e.g. rearrange things to look pretty without breaking anything. Now that I've found the key macro definitions in e.g. .gconf, for example, I'll almost certainly edit them directly by hand instead of with gconf-editor, if only because gconf-editor produces %gconf.xml that is horribly ugly and unindented for no particularly good reason (except that as a tool it is basically only half finished). d) However, the final reason to use XMLish is (as already noted) it makes it very easy to write a UI to create, edit, display the configuration files. XMLish is what, a style sheet or so away from being directly renderable in any browser or browser form? libxml2 makes it easy (tedious, but easy) to parse an xml file, extract tag contents and labels, and pack the values away into e.g. a control struct for the application OR editor, permit editing, and write the result back into the file. Curiously, libxml2 >>will<< automagically indent (note the output from xmlsysd, indented by library calls and not me), and this indentation is the key to c). Disk space is ludicrously cheap and plentiful, and saving a few bytes per line at the expense of readability is just ludicrous. Let me indicate where I'm coming from. Writing xmlsysd (and procstatd before it) I have, naturally, had to parse /proc. /proc is a perfect example of Diabolical Evil. Every human who has ever contributed a kernel component or module that writes to /proc has used their own personal specially clever file layout for packing the data it presents. /proc/meminfo presents (most of it) twice -- inexplicably, the first form is the one humans generally see, except that bytes are converted by "free" to K silently and +/- buffers is worked out. The second form humans NEVER see but the same values presented exactly in the first three lines are re-presented in a variable: value size format in less accurate K! Then there is /proc/stat, where one can see what three or four different ways of packing data (none of the values labelled in any case): variable value1 value2 value3 variable value variable: (a1,a2):(x1,x2,x3...) (b1,b2):(y1,y2,y3...) ... (where the latter makes me want to tear my little remaining hair). Finally there are straight tables (/proc/net/dev). Sometimes the tables actually have column labels -- this one does. /proc/PID/maps (e.g.) does not. Working our way down through /etc we see more of the same, only (if it were possible) worse. /etc/fstab is a straight ws-delimited table, but a number of the table lines are vectors. /etc/passwd is a :-delimited table, ws in a field is data. /etc/group is a :-delimited table with an optional comma delimited variable length vector as one field. /etc/exports is a table with diabolical rules regarding table entry that is what, mount client1(opt1,opt2...) client2(opt3,opt4...) ... a format that reduces even skilled and experienced sysadmins to cursing when they brainlessly insert ws in the wrong place. And we won't TOUCH the nameserver tables where leaving off a ; or using a space instead of a tab is death. Even the xinetd designers blew it. There they were, redesigning everything and they opted for yet another variant of the generic format used by yum now, but one with a meaningless leading term ("service"), bracket delimiters (in spite of the fact that it is already FILE delimited) and variable [=,+=...] value [value] [value] otherwise. With this underpinning, it is little wonder that people look at Unix with despair -- expert friendly doesn't begin to describe it. A fairer assessment would be that it is all part of a diabolical plot to ensure that Unix/Linux sysadmins remain scarce and make a lot of money, because you damn sure gotta be a genius in raw IQ just to keep track of what lives where, in what format, and what is "meaningful" in each format. No WONDER UI tools to manage all of this that actually work are rare -- hink of the DIFFERENT rules that have to be encapsulated just to parse any table and recreate it in a viable form, and I'm not talking about checking or validating the actual data structure itself! As you can see, this is a bit of a religious issue with me. If every new tool added to linux were rigorously written with a fully documented xml non-interactive interface (whatever the interface might be) and every major rewrite of every old tool grafted on an xml replacement for the outta-my-ass interface that is already there, in perhaps 3-5 years of moderate pain linux could emerge on the other side as being the first pure-XML API OS in the world. It would actually be POSSIBLE to write a "supertool" for systems administration, even of a network, that wasn't a horrible and unstable kludge and that totally fronted all the major configurational entities, while still retaining the lovely ascii text flatfile human manageable character that has distinguished unix for its whole lifetime. However, in the open source community this process has to begin one application at a time, and people writing (or rewriting) the applications have to communicate with other people and convert them with their religious zeal and passion and by the pure example of their work. Once there is a critical mass of components, supertools will start to emerge because they can be built incrementally and trivially organized underneath a common GUI shell, much as gnome is endeavoring to do now at the user level. So go thou, and preach the good word, brother...;-) -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@xxxxxxxxxxxx