RE: Factoring RPM sets (and parsing comps.xml)

"Oden, James" <James.Oden@xxxxxxxxxxx> · Fri, 27 Oct 2006 08:10:55 -0500

> 
> Ok, understood.
> 
> Well, the code to read the comps.xml file and stick it into a
reasonably
> meaningful datastructure must exist at least, right?
>
There is something that parses the comps.xml.  Go to the anaconda web
page:

  http://rhlinux.redhat.com/anaconda/comps.html

The problem is what you want is something that basically performs set
operations on the components, and looks at a list of packages and
deteremines what components best fit that list of packages which the
library does not provide.  The latter may not be too hard starting with
the existing parser.  Just off the top of my head:

   - parse with the existing library. 
   - Walk through the packages and each component and build a dictionary

     Whose key is package name and value is component name.
   - Now armed with this dictionary go through you list of packages, and
see
     what components are used.
   - When you find a component is used keep a counter of how many
packages  
     have been used by this component.

That would give you just in raw form which components are used and the
raw information to at the end do a final pass and apply some heuristic
to pick which components have been added to or subtracted from.  For
example if you have less than 50% utilization of a component then throw
that component away and say the packages were added to it (this is one
possible heuristic).

This is of course a real rough sketch of an algorithm.  YMMV.

> I'll dig around the Anaconda and Kickstart sources when I'm back in
> front of a development machine...  hopefully there'll be something
that
> can be reused.
Unfortunately there are not a lot of higher level distribution build and
management tools out there (there are some, but there just not at the
level of management you are looking for), so your going into AFICT
uncharted territory.

Seriously, good luck...james