On Fri, 30 Apr 2004, seth vidal wrote: > Have you programmed in python before? A simple python dict is what I'm > talking about. No Python, just Perl. I looked around and looks there is no way to pre-size the dict to avoid many resizes while filling data. We may drop this idea. > You can build up that sort of dict and traverse it but > you still have to: > > open the package > get the data you want > put the data in the dict > close the package > > Doesn't sound too bad - but the process for opening and looking through > a package does take some time. Perhaps this can be cut by addressing only new/deleted packages for each run. > Where do you store that cache? On disk, should live between runs. > How do you store it? Oops, too specific for my Python knowledge! For example, in Perl is as trivial as linking a hash to a disk DB... > How do you update it > to make sure it's not out of sync with the repository w/o reindexing all > the headers/packages? Ideally without opening/scanning each package, just using file system info. I see that header.info bears only file name & path, but, to my understanding, this should be enough to spot additions/deletions. Let's consider the steps: - load local cache of package names (nDB) and load current header.info from repos; - make a list with additions and one with deletions with respect to nDB based on file names; - load the big DB (pDB), the one with all pkg info (version, requires, provide, etc.); - remove deleted packages from pDB, grab info for added packages and add it to pDB. Now the pDB should hold in detail the current repos situation and the real job may begin -- compute dependencies, etc. Just before exit, save pDB for the next run. If it doesn't make any sense, don't bother to answer and sorry for ranting. :) Mihai