[Yum] script run idea

rgb@xxxxxxxxxxxx (Robert G. Brown) · Fri, 30 May 2003 17:18:42 -0400 (EDT)

<abstract>rgb replies to icon</abstract>

On Fri, 30 May 2003, Konstantin Riabitsev wrote:

> 1. Visual simplicity
> This is the top and foremost reason to use the "win.ini" format. E.g.:
> 
> [mysection]
> foo=bar
> baz=quux
> 
> This is very easy to grok visually, because it limits the formatting 
> symbols to just a few. You can see clearly the section name, the 
> name of the variable, and the value that is assigned to it. The same 
> data in XML would look like so:
> 
> <config>
>   <section id="mysection">
>    <set>
>     <variable>foo</variable>
>     <value>bar</value>
>    </set>
>    <set>
>     <variable>baz</variable>
>     <value>quux</value>
>    </set>
>   </section>
> </config>
> 
> It is very hard to quickly make out the following information:
>   a) Name of the section
>   b) Which one is the variable name
>   c) Which one is the value

This isn't entirely fair.  Or at least, I'd never design the interface
that way.  The proper comparison would be:

 <config>
   <section id="mysection">
    <foo>bar</foo>
    <baz>quux</baz>
   </section>
 </config>

which is the way at least I would encode it (so each variable has a
unique xpath within the hierarchy), possibly with attributes such as:

  <foo type='text'>bar</foo>

if <foo> could hold either integer or string text.  I'd say this is just
exactly as easy to read as win.ini, and just about exactly as easy to
maintain.  It >>is<< longer, and in this trivial example appears to be
overkill (and in this trivial example may be:-).

Even this example reveals immediate weaknesses in the foo=bar format.
How do you specify that foo=1234 is supposed to be a string, an int, or
a float (in the possibly unlikely event that you have a config variable
that could be any of the above)?  This isn't completely crazed as an
example -- suppose Seth decides to make debug=int into
debug=["quite","loud","incredibly noisy","deafening"] to improve human
readability because the number "2" isn't terribly illuminating?  If it
were XML he could parse the default type to be integer but add an
attribute allowing it to be specified as text.  He could even make debug
level an ATTRIBUTE of the <repository> tag as in <repository id="dulug"
debug="2">... and only have to endure lots of tedious debuggery from
perhaps the only repository that is producing a problem.

And then there are the cases where foo=bar, during the evolution of the
program, becomes

 foo=bar,bari,an,horde

or (gah)

 foo=bar(bari,an,horde)

(which from the look of things in e.g. /etc/exports, /etc/fstab,
elsewhere is an incredibly common occurrence in config files or other
tables of all sorts).  XML permits this emerging new hierarchy in one of
several reasonably sane ways:

  <foo id='1'>bar</foo>
  <foo id='2'>bari</foo>
  <foo id='3'>an</foo>
  <foo id='4'>horde</foo>

if it is really a foo vector,

  <foo>
   <init>bar<init>
   <font>bari<font>
   <control>an<control>
   <exit>horde<exit>
  </foo>

in the event that foo suddenly acquired a set of controllable
parameters, each with its own name (a name that might be SHARED with
some other parameter later).  

Finally, one can even continue to do:

  <foo>bar,bari,an,horde</foo>

and parse them out if one wishes to obfuscate users and absolutely
require the use of an associated manual and have to specify all four in
order to override the default only for the third one.  xmlpath
uniqueness more or less imposes a certain degree of order and sanity.

XML used this way encourages the creation of autodocumenting interfaces.
Quick, without looking at source, what are the meanings of the four
fields for cpu in /proc/stat?  Hell, I've WRITTEN source parsing them
and using them to generate stats for presentation, and I can't remember
without looking (which may just reflect my general stupidity) but if
they were wrapped the way they are in an xmlsysd return:

      <cpu id="0">
        <user>1025024</user>
        <nice>5541</nice>
        <sys>328659</sys>
        <idle>201026019</idle>
      </cpu>

I don't have to.  I would argue that any human could read the output
from xmlsysd and have a very good idea at a glance what the state of the
system is WITHOUT a UI of any sort.  Those fields where that isn't true
(e.g. <shmbufs>) it is likely to be more because of ignorance of what
shared memory is and what the fields do operationally in the first place
than because one cannot figure out what the fields are called or their
place in the stats hierarchy.  Compared to raw /proc itself it is a
limpid pool, if still only halfway to "cooked" values in many cases.

So, in general, I'd say that if xml tags are well chosen it is
(exactly!) as readable as win.ini even for simple files, but agree that
its real advantage comes forth when the file grows more complex or needs
to "suddenly" contain entries your parser isn't equipped to handle or
that require a second stage of parsing (e.g. csv, wssv in a vector).

The trouble is, it is difficult at the beginning of a project to know
just when that will occur.  At the beginning, sure, xml will seem like
overkill -- nuking fleas -- when all the project will EVER need is two
or three variables.  Of course, in this case the variables should
probably be set via command line options anyway, configuration files
themselves may be overkill.  Alas, simple tools grow complex over time
-- I'll bet even ls was simple once upon a time.

So your points are well taken, but I honestly think that the moment to
think xml is right when you make the first [toplevel] win.ini category
OR your first set of kludgily named related variables OR (worse) a
vector of values parsed in a second stage.  You've just introduced a
hierarchical configurational model, like it or not, in all of these
cases, and are at the break even point where any additional complexity
in win.ini will become obfuscated.  Only if you are certain that you
will NEVER need more than what amounts to a hash table (hash,value) to
get all your values in is win.ini not likely to be outgrown.

Your other obserservation (regarding comments) is well taken.  XML files
have to be a lot more disciplined than files you parse out yourself.
Still, there is a reason for that.  You may or may not agree with the
reason, but it is there and the discipline it imposes may in the long
run pay back with the improved data design it forces on you like it or
not.  XML is a wee bit fascist compared to e.g. html that doesn't care
if you close tags and so forth...

> Keeping these points in mind, I would actually advocate picking a 
> simple/sane XML file format and going with it. XML exists for 
> representing complex data structures with the help of human-editable 
> Markup Language, which may in the future require eXtending, so it 
> seems to me that it is actually the best choice for the job.
> 
> Writing yet-another-config-file, e.g. using %silly delimiters might 
> quickly backfire if only for the same reasons why I can no longer 
> write "cleaned up %post" in the spec-file changelog entries. ;)

Exactly.  Spec being yet another example of something where it probably
seemed like, and even was, a perfectly good idea back around 1.0 or even
2.0 -- but then things got complex, fields were added, fields of
different TYPE were added, a specialized parser with its own rules were
added, and damn -- now it requires a huge amount of documentation
because it isn't clear that those rules were all chosen hierarchically
with open/close delimiters.  XML is a pain, no doubt, but if you use it
when you don't really have to for simple stuff, it will save you a lot
more pain later when you want to add a

<post>
run install_everything
</post>

<comment date="05/30/03 or whatever you like">"This is a comment about
stuff in the <post> section above"</comment>

But <sigh>spec ain't xml (yet)</sigh>

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@xxxxxxxxxxxx