Dimitrios Apostolou <jimis@xxxxxxx> writes: > Hello list, > > I have been using fedora on various machines, many of which are fairly > old, so I'm constantly trying to remove unnecessary fat and make > things speedier. Unfortunately when the basic package manager is slow > things aren't looking too good. > > Running only "yum help" on an 800MHz PC with fedora 11 needs about > 2.2s. Running "yum check-update" takes more than 20s to return an > empty list. [...] > Perhaps I shouldn't even mention how yum (old version) slowness looks > in an old sparcstation 5 running Aurora Linux. It needs hours for > performing operations and is constantly swapping. It is the most > important obstacle for using that distro on such machinery. If that's your way of asking if we'll help you with patches to make yum faster, then yes we will ... upto a point. It should not surprise you that hardware from 6 to over 10 years ago, is going to be what most people are developing or testing with. > So I've been doing some profiling on yum. > As far as "yum help" is concerned, I haven't reached any important > conclusions. Most time is consumed in ini-parsing, URL parsing and > python module initialisations. It'd be nice to have some numbers. But I can confirm that on a modernish machine "yum list yum" seems to take roughly a second, and python init and ini-parsing are significant parts of that. > Really way too much diverse stuff to > try and improve something. *shrug*, that's mostly what performance work is. > FYI functions to look into are > getReposFromConfig@xxxxxxxxxxx and readStartupConfig@xxxxxxxxx and > object initialisations (__init__.py?) in general. > > As far as check-update goes, _buildPkgObjList@xxxxxxxxxxxxx takes by > far the most time. The current way it works is by doing one query to > sqlite returning all packages, and then manually parsing the result > for excludes and converting it to python objects, all done with > repetitive python code. True. > Is there a reason for not using a proper SQL query for returning all > packages needed, excluding excludes? A few reasons, but are you sure you need to try that? If you just stop the package creation, does that help? -- ie. have simplePkgList() return the pkgtups without creating package objects first? > I can see the following comment: > > # Note: If we are building the pkgobjlist, we don't exclude > # here, so that we can un-exclude later on ... if that matters. > > Does that matters? No, that comment needs to die. See the comment a couple of lines down from it. > If we really take advantage of sqlite and build a query returning > exactly what we want, then why do we need to build separate python > PackageObject list? > > I attach a patch which improves a lot the time needed for check-update > by avoiding to populate the YumSqlitePackageSack objects and by > calculating updates only using the (n,a,e,v,r) list > returned. _buildPkgObjList is not even used. For this simple case it > works so it makes me wonder... > > What do you think? Is this preliminary patch in the right direction? > What do you propose for improving speed even further but not breaking > existing functionality? Don't create returnPackageTuples() and change PackageSack.simplePkgList(), just override simplePkgList() for YumSqlitePackageSack(). The patch (and later versions) are incomplete, you are only implementing include.match and exclude.match from the excluder API. You don't implement the matching properly, as you are running the GLOBs only on package names. You don't implement include.match properly, the traditional behaviour is that a package has to pass _both_ "includepkgs" and "exclude" not either. That's fine as a proof of concept, but you didn't mark the patches as being that. I doubt you've tried many exclusions, as I'm pretty sure sqlite will fail (which is why we have the limits like PATTERNS_INDEXED_MAX). You can't alter the .sqlite files as you've done in the last version of your patch ... ie. temporary tables can't be used. You've not given any results: 1. How long did the old SQL query take. 2. How long does the new SQL query take. 3. How long does the python pkgExcluder code take. 4. What is 2 vs. 3 for small/large exclusions. ...and as I said above, it'd be nice to know how much time is taken up with just "package object creation" as against the select + python exclude. Also check-updates isn't the best thing to measure, as it's not that simple (requiring all pkg data to be loaded) and apparently doesn't require much more than the pkgtups for most of the data (maybe that's true for update/install/etc. in general though). You might want to come onto IRC #yum on FreeNode to talk to us tomorrow. -- James Antill -- james@xxxxxxx _______________________________________________ Yum mailing list Yum@xxxxxxxxxxxxxxxxx http://lists.baseurl.org/mailman/listinfo/yum