[Yum] Performance enhancement: [was yum performance]

jtate at dragonstrider.com (Joseph Tate) · Sat May 1 00:30:05 2004

Here's my idea.  If you've already thought of this and rejected it, I'd 
love to know why.

1.  Switch yum to use -C by default, introduce a different flag to 
signal an update to the cache.  Thus day to day operations can be done 
with a pregenerated hash and a prebuilt cache.

2.  Build a serializeable hash structure, or utilize an existing 
mechanism.  I don't care if it's stored in XML or db4 or just a plain 
text file, but make it fast to read in and write out.  Just this will 
make N fopen, fread, fclose calls to 1 set.  Won't do much for memory 
usage though.

3.  Change the yum cron job to use the new -!C flag so that that hash 
index gets generated nightly.

As I don't think this would be too difficult, I'll try generating some 
patches against HEAD, though as a non-Python programmer it may look a 
little Perl-ish.

Joseph

seth vidal wrote:
>>First idea: I remember that hashes are fast to search, but,
>>comparatively, very slow to grow.  To overcome this, most
>>hash libraries allow to define an initial size, which should
>>be best guessed large enough to accomodate all the entries,
>>to avoid frequent time consuming resizes while filling.
>>Does your package offer this feature?
> 
> 
> Have you programmed in python before? A simple python dict is what I'm
> talking about. You can build up that sort of dict and traverse it but
> you still have to:
> 
> open the package
> get the data you want
> put the data in the dict
> close the package
> 
> Doesn't sound too bad - but the process for opening and looking through
> a package does take some time.
> 
> 
> 
>>Second idea: you mentioned package traversal as time consuming.
>>Is this time spent to open each package as a DB, grab the
>>info, close it?  If this is the case, have you then considered
>>building a cache of package contents, which can be updated
>>and used in subsequent runs, to take advantage that most
>>(if not all) the packages do not change between yum runs?
> 
> 
> In this case it's opening up each header, getting the data and moving
> along, but yes, it can take some time to search each one.
> 
> Where do you store that cache? How do you store it? How do you update it
> to make sure it's not out of sync with the repository w/o reindexing all
> the headers/packages? Feel free to answer any/all of those questions.
> 
> Some of these have already been addressed - many of them is why I spent
> so much time working on the xml-metadata to sort out
> easier/faster/better ways of indexing the packages so yum can:
> 
> 1. know if there are changes
> 2. more easily traverse the packages and the metadata
> 3. have smaller amounts of data to download and sort through on any run.
> 
> Right now I'm making those changes work then I'm going to focus on
> trimming time out of each session. It will still be some time b/c I'm
> working on this as I can. 
> 
> If you want to be a big help, don't look at improvements for speedups to
> the 2.0.X branch. I don't want to spend more time on 2.0.X if it is at
> all possible. A lot of things in the structure has changed and cvs-HEAD
> is where I'm trying to work the most. When I have a snapshot that does
> some useful things I'll be sure to announce it here and yum-devel.
> 
> If you're a python programmer and you're familiar with libxml2 - then
> take a look at http://linux.duke.edu/metadata/generate/ - feel free to
> make that code:
> 
>  1. look for an existent repodata dir
>  2. if it finds one - use the xml files there to speed up the update
> creation of the new metadata for that repository.
> 
> -sv
> 
> 
> _______________________________________________
> Yum mailing list
> Yum@xxxxxxxxxxxxxxxxxxxx
> https://lists.dulug.duke.edu/mailman/listinfo/yum