Re: yum performance

James Antill <james-yum@xxxxxxx> · Mon, 17 Aug 2009 02:16:05 -0400

Dimitrios Apostolou <jimis@xxxxxxx> writes:

> Hello list,
>
> I have been using fedora on various machines, many of which are fairly
> old, so I'm constantly trying to remove unnecessary fat and make
> things speedier. Unfortunately when the basic package manager is slow
> things aren't looking too good.
>
> Running only "yum help" on an 800MHz PC with fedora 11 needs about
> 2.2s. Running "yum check-update" takes more than 20s to return an
> empty list.

[...]

> Perhaps I shouldn't even mention how yum (old version) slowness looks
> in an old sparcstation 5 running Aurora Linux. It needs hours for
> performing operations and is constantly swapping. It is the most
> important obstacle for using that distro on such machinery.

 If that's your way of asking if we'll help you with patches to make
yum faster, then yes we will ... upto a point.
 It should not surprise you that hardware from 6 to over 10 years ago,
is going to be what most people are developing or testing with.

> So I've been doing some profiling on yum.
> As far as "yum help" is concerned, I haven't reached any important
> conclusions. Most time is consumed in ini-parsing, URL parsing and
> python module initialisations.

 It'd be nice to have some numbers. But I can confirm that on a
modernish machine "yum list yum" seems to take roughly a second, and
python init and ini-parsing are significant parts of that.

> Really way too much diverse stuff to
> try and improve something.

 *shrug*, that's mostly what performance work is.

> FYI functions to look into are
> getReposFromConfig@xxxxxxxxxxx and readStartupConfig@xxxxxxxxx and
> object initialisations (__init__.py?) in general.
>
> As far as check-update goes, _buildPkgObjList@xxxxxxxxxxxxx takes by
> far the most time. The current way it works is by doing one query to
> sqlite returning all packages, and then manually parsing the result
> for excludes and converting it to python objects, all done with
> repetitive python code.

 True.

> Is there a reason for not using a proper SQL query for returning all
> packages needed, excluding excludes?

 A few reasons, but are you sure you need to try that? If you just
stop the package creation, does that help? -- ie. have simplePkgList()
return the pkgtups without creating package objects first?

> I can see the following comment:
>
> #  Note: If we are building the pkgobjlist, we don't exclude
> # here, so that we can un-exclude later on ... if that matters.
>
> Does that matters?

 No, that comment needs to die. See the comment a couple of lines down
from it.

> If we really take advantage of sqlite and build a query returning
> exactly what we want, then why do we need to build separate python
> PackageObject list?
>
> I attach a patch which improves a lot the time needed for check-update
> by avoiding to populate the YumSqlitePackageSack objects and by
> calculating updates only using the (n,a,e,v,r) list
> returned. _buildPkgObjList is not even used. For this simple case it
> works so it makes me wonder...
>
> What do you think? Is this preliminary patch in the right direction? 
> What do you propose for improving speed even further but not breaking
> existing functionality?

 Don't create returnPackageTuples() and change
PackageSack.simplePkgList(), just override simplePkgList() for
YumSqlitePackageSack().

 The patch (and later versions) are incomplete, you are only
implementing include.match and exclude.match from the excluder API.
 You don't implement the matching properly, as you are running the
GLOBs only on package names.
 You don't implement include.match properly, the traditional behaviour
is that a package has to pass _both_ "includepkgs" and "exclude" not
either.
 That's fine as a proof of concept, but you didn't mark the patches as
being that.

 I doubt you've tried many exclusions, as I'm pretty sure sqlite will
fail (which is why we have the limits like PATTERNS_INDEXED_MAX).

 You can't alter the .sqlite files as you've done in the last version
of your patch ... ie. temporary tables can't be used.

 You've not given any results:

   1. How long did the old SQL query take.
   2. How long does the new SQL query take.
   3. How long does the python pkgExcluder code take.
   4. What is 2 vs. 3 for small/large exclusions.

...and as I said above, it'd be nice to know how much time is taken up
with just "package object creation" as against the select + python
exclude.

 Also check-updates isn't the best thing to measure, as it's not that
simple (requiring all pkg data to be loaded) and apparently doesn't
require much more than the pkgtups for most of the data (maybe that's
true for update/install/etc. in general though).

 You might want to come onto IRC #yum on FreeNode to talk to us
tomorrow.

-- 
James Antill -- james@xxxxxxx
_______________________________________________
Yum mailing list
Yum@xxxxxxxxxxxxxxxxx
http://lists.baseurl.org/mailman/listinfo/yum