> > Ok, understood. > > Well, the code to read the comps.xml file and stick it into a reasonably > meaningful datastructure must exist at least, right? > There is something that parses the comps.xml. Go to the anaconda web page: http://rhlinux.redhat.com/anaconda/comps.html The problem is what you want is something that basically performs set operations on the components, and looks at a list of packages and deteremines what components best fit that list of packages which the library does not provide. The latter may not be too hard starting with the existing parser. Just off the top of my head: - parse with the existing library. - Walk through the packages and each component and build a dictionary Whose key is package name and value is component name. - Now armed with this dictionary go through you list of packages, and see what components are used. - When you find a component is used keep a counter of how many packages have been used by this component. That would give you just in raw form which components are used and the raw information to at the end do a final pass and apply some heuristic to pick which components have been added to or subtracted from. For example if you have less than 50% utilization of a component then throw that component away and say the packages were added to it (this is one possible heuristic). This is of course a real rough sketch of an algorithm. YMMV. > I'll dig around the Anaconda and Kickstart sources when I'm back in > front of a development machine... hopefully there'll be something that > can be reused. Unfortunately there are not a lot of higher level distribution build and management tools out there (there are some, but there just not at the level of management you are looking for), so your going into AFICT uncharted territory. Seriously, good luck...james