[Yum] script run idea

rgb@xxxxxxxxxxxx (Robert G. Brown) · Fri, 30 May 2003 08:25:16 -0400 (EDT)

On 29 May 2003, seth vidal wrote:

> I'm not sure where I come down on this.
> 
> other comments from the peanut-gallery? :)

XML!  XML!  Rah!  Rah!

(predictably enough:-)

Seriously, a well designed set of tags is just as easy to human edit as
a well designed plaintext config file -- maybe even EASIER as
hierarchical dependence is enforced by the open/close tag rules.  It is
also very easy to read if it is properly indented (much like html or
sgml, which people write all the time).

The only tricky part is to make sure that you write your documentation
at the same time, so that you have a clean description in human english
of what each tag delimits and which tags are subtags.

Something like:

<?xml version="1.0"?>
<yum version="2.0.0">
  <cachedir>/var/cache/yum</cachedir>
  <debuglevel>2</debuglevel>
  <logfile>/var/log/yum.log</logfile>
  <distroverpkg>Frog Linux</destroverpkg>
  <releasever>$releasever</realeasever>
  <basearch>$basearch</basearch>

  <repository id="base">
    <name>Frog Base</name>
    <url>http://mirror.dulug.duke.edu/pub/yum-repository/redhat/$releasever/$basearch/</url>
  </repository>

  <repository id="updates">
    <name>Frog Updates</name>
    <url>http://mirror.dulug.duke.edu/pub/yum-repository/redhat/updates/$releasever/$basearch/</url>
  </repository>

</yum>

works, for example.  Or use 
    <a href="http://mirror.dulug.duke.edu/etc...";>Frog Base</a>
for a more recognizable/portable url format.

As to why, I think I've gone over this several times in other forums
(fori? forae? damn, can't remember my latin) but for the record in this
one, in order of importance:

  a) If XMLish becomes a universal standard for >>all<< linux
configuration, and if people develop the habit of >>always<< documenting
the XMLish API as they write it, many, many good things will come of it.
Including:

  b) Consistency.  Configurational grouping is now haphazard.  Some
configuration files (like yum's) use a variable-assignment format.
However, the same variable names are reused.  To differentiate and
delineate them, named blocks are used.  However, the user does not
generally know the named block "rules".  Is it the empty line between
blocks that counts?  If they indent blocks will it matter?  What about
whitespace and tabs, Do they need to put spaces to the right of the =
sign in quotes?  For that matter, do variables have implicit types (so
version=1a will barf, where version=1 will not)?  XML spells out most of
that (no,no,only inside "strings",no = sign but no,no but this is still
a design decision on the parsing side and must be documented).  Line
continuation, termination, nesting rules, all consistent and
unambiguous.

  c) Extensibility.  The way xml is parsed, extra tags will generally be
ignored.  Adding features (new tags) to a config file will thus often
not break older versions of the software, as long as one doesn't
rearrange or change the meaning of existing tags.  Also, sooner or later
in a complex project one needs to add e.g. a vector of values.  In once
sense, yum already does this -- a vector of repostitory description
blocks, but one may need e.g. a vector of times the repository will be
open to a particular client (imagining an unlikely time when load is so
great that clients have to be allocated particular time block(s) on a
round-robin basis throughout the day to defer their requests to).  XML
provides a consistent way of doing this that is neither 

time[0]=x
time[1]=y

nor

time="x,y,z..."

nor

time=x	y	z

(tab-delimited, Seth's favorite:-) nor any of the other bastardized ways
people have decided to solve a problem like this over the years.

Another point (as taught by friend Icon) is that many variables require
"modifiers" or "labels"; xml has the facility for tags to have their
internal <tag parm1="very" parm2="cool">parameters</tag>.  Again, in a
variable=value paradigm what can you do?  

tag=parameters
tagparm1=very
tagparm2=cool

is ugly, and other solutions are even worse.

  c) XMLish conf files, with indendation and comments, are actually
remarkably easy to human edit.  I do it all the time.  I >>prefer<< to
do it.  Even without tag documentation, I at least know the significant
elements of xml and thus know that I can e.g. rearrange things to look
pretty without breaking anything.  Now that I've found the key macro
definitions in e.g. .gconf, for example, I'll almost certainly edit them
directly by hand instead of with gconf-editor, if only because
gconf-editor produces %gconf.xml that is horribly ugly and unindented
for no particularly good reason (except that as a tool it is basically
only half finished).

  d) However, the final reason to use XMLish is (as already noted) it
makes it very easy to write a UI to create, edit, display the
configuration files.  XMLish is what, a style sheet or so away from
being directly renderable in any browser or browser form?  libxml2 makes
it easy (tedious, but easy) to parse an xml file, extract tag contents
and labels, and pack the values away into e.g. a control struct for the
application OR editor, permit editing, and write the result back into
the file.  Curiously, libxml2 >>will<< automagically indent (note the
output from xmlsysd, indented by library calls and not me), and this
indentation is the key to c).  Disk space is ludicrously cheap and
plentiful, and saving a few bytes per line at the expense of readability
is just ludicrous.

Let me indicate where I'm coming from.  Writing xmlsysd (and procstatd
before it) I have, naturally, had to parse /proc.  /proc is a perfect
example of Diabolical Evil.  Every human who has ever contributed a
kernel component or module that writes to /proc has used their own
personal specially clever file layout for packing the data it presents.
/proc/meminfo presents (most of it) twice -- inexplicably, the first
form is the one humans generally see, except that bytes are converted by
"free" to K silently and +/- buffers is worked out.  The second form
humans NEVER see but the same values presented exactly in the first
three lines are re-presented in a variable: value size format in less
accurate K!  Then there is /proc/stat, where one can see what three or
four different ways of packing data (none of the values labelled in any
case):

variable value1 value2 value3
variable value
variable: (a1,a2):(x1,x2,x3...) (b1,b2):(y1,y2,y3...) ...

(where the latter makes me want to tear my little remaining hair).
Finally there are straight tables (/proc/net/dev).  Sometimes the tables
actually have column labels -- this one does.  /proc/PID/maps (e.g.)
does not.  

Working our way down through /etc we see more of the same, only (if it
were possible) worse.  /etc/fstab is a straight ws-delimited table, but
a number of the table lines are vectors.  /etc/passwd is a :-delimited
table, ws in a field is data.  /etc/group is a :-delimited table with an
optional comma delimited variable length vector as one field.
/etc/exports is a table with diabolical rules regarding table entry that
is what,

mount  client1(opt1,opt2...) client2(opt3,opt4...) ...

a format that reduces even skilled and experienced sysadmins to cursing
when they brainlessly insert ws in the wrong place.  And we won't TOUCH
the nameserver tables where leaving off a ; or using a space instead of
a tab is death.  Even the xinetd designers blew it.  There they were,
redesigning everything and they opted for yet another variant of the
generic format used by yum now, but one with a meaningless leading term
("service"), bracket delimiters (in spite of the fact that it is already
FILE delimited) and variable [=,+=...] value [value] [value] otherwise.

With this underpinning, it is little wonder that people look at Unix
with despair -- expert friendly doesn't begin to describe it.  A fairer
assessment would be that it is all part of a diabolical plot to ensure
that Unix/Linux sysadmins remain scarce and make a lot of money, because
you damn sure gotta be a genius in raw IQ just to keep track of what
lives where, in what format, and what is "meaningful" in each format.
No WONDER UI tools to manage all of this that actually work are rare --
hink of the DIFFERENT rules that have to be encapsulated just to parse
any table and recreate it in a viable form, and I'm not talking about
checking or validating the actual data structure itself!

As you can see, this is a bit of a religious issue with me.  If every
new tool added to linux were rigorously written with a fully documented
xml non-interactive interface (whatever the interface might be) and
every major rewrite of every old tool grafted on an xml replacement for
the outta-my-ass interface that is already there, in perhaps 3-5 years
of moderate pain linux could emerge on the other side as being the first
pure-XML API OS in the world.  It would actually be POSSIBLE to write a
"supertool" for systems administration, even of a network, that wasn't a
horrible and unstable kludge and that totally fronted all the major
configurational entities, while still retaining the lovely ascii text
flatfile human manageable character that has distinguished unix for its
whole lifetime.

However, in the open source community this process has to begin one
application at a time, and people writing (or rewriting) the
applications have to communicate with other people and convert them with
their religious zeal and passion and by the pure example of their work.
Once there is a critical mass of components, supertools will start to
emerge because they can be built incrementally and trivially organized
underneath a common GUI shell, much as gnome is endeavoring to do now at
the user level.

So go thou, and preach the good word, brother...;-)

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@xxxxxxxxxxxx