[Yum] yum performance

mihai at email.it (Mihai T. Lazarescu) · Sat May 1 08:14:25 2004

On Fri, 30 Apr 2004, seth vidal wrote:

> Have you programmed in python before? A simple python dict is what I'm
> talking about.

No Python, just Perl.  I looked around and looks there is no way
to pre-size the dict to avoid many resizes while filling data.
We may drop this idea.

> You can build up that sort of dict and traverse it but
> you still have to:
> 
> open the package
> get the data you want
> put the data in the dict
> close the package
> 
> Doesn't sound too bad - but the process for opening and looking through
> a package does take some time.

Perhaps this can be cut by addressing only new/deleted packages
for each run.

> Where do you store that cache?

On disk, should live between runs.

> How do you store it?

Oops, too specific for my Python knowledge!  For example,
in Perl is as trivial as linking a hash to a disk DB...

> How do you update it
> to make sure it's not out of sync with the repository w/o reindexing all
> the headers/packages?

Ideally without opening/scanning each package, just using file
system info.  I see that header.info bears only file name &
path, but, to my understanding, this should be enough to spot
additions/deletions.  Let's consider the steps:

- load local cache of package names (nDB) and load current
  header.info from repos;

- make a list with additions and one with deletions with
  respect to nDB based on file names;

- load the big DB (pDB), the one with all pkg info (version,
  requires, provide, etc.);

- remove deleted packages from pDB, grab info for added packages
  and add it to pDB.

Now the pDB should hold in detail the current repos situation
and the real job may begin -- compute dependencies, etc.
Just before exit, save pDB for the next run.

If it doesn't make any sense, don't bother to answer and sorry
for ranting. :)

Mihai