Re: yum performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 17 Aug 2009, James Antill wrote:

Dimitrios Apostolou <jimis@xxxxxxx> writes:

Hello list,

I have been using fedora on various machines, many of which are fairly
old, so I'm constantly trying to remove unnecessary fat and make
things speedier. Unfortunately when the basic package manager is slow
things aren't looking too good.

Running only "yum help" on an 800MHz PC with fedora 11 needs about
2.2s. Running "yum check-update" takes more than 20s to return an
empty list.

[...]

Perhaps I shouldn't even mention how yum (old version) slowness looks
in an old sparcstation 5 running Aurora Linux. It needs hours for
performing operations and is constantly swapping. It is the most
important obstacle for using that distro on such machinery.

If that's your way of asking if we'll help you with patches to make
yum faster, then yes we will ... upto a point.
It should not surprise you that hardware from 6 to over 10 years ago,
is going to be what most people are developing or testing with.

I mentioned all that just to point out that /yum is slow/ concerning responsiveness in general, even if it's not something you can feel on modern machines that most people (including me) use for development. Perhaps I should have skipped that intro...

So I've been doing some profiling on yum.
As far as "yum help" is concerned, I haven't reached any important
conclusions. Most time is consumed in ini-parsing, URL parsing and
python module initialisations.

It'd be nice to have some numbers. But I can confirm that on a
modernish machine "yum list yum" seems to take roughly a second, and
python init and ini-parsing are significant parts of that.

I am using the "Run Snake Run" program
( http://www.vrplumber.com/programming/runsnakerun/ ) which provides a nice graphical representation of the profile so I did not pay proper attention to raw numbers. I'll try to post numbers in the future.
I can also send screenshots. :-)


Really way too much diverse stuff to
try and improve something.

*shrug*, that's mostly what performance work is.

What I meant is that for the "yum help" case profiling didn't show a specific bottleneck inside a function. Just that a great number of different functions were called with no single one being a hotspot.


FYI functions to look into are
getReposFromConfig@xxxxxxxxxxx and readStartupConfig@xxxxxxxxx and
object initialisations (__init__.py?) in general.

As far as check-update goes, _buildPkgObjList@xxxxxxxxxxxxx takes by
far the most time. The current way it works is by doing one query to
sqlite returning all packages, and then manually parsing the result
for excludes and converting it to python objects, all done with
repetitive python code.

True.

Is there a reason for not using a proper SQL query for returning all
packages needed, excluding excludes?

A few reasons, but are you sure you need to try that? If you just
stop the package creation, does that help? -- ie. have simplePkgList()
return the pkgtups without creating package objects first?

Yes package creation together with excludes was a major slowdown for the check-update case. My patch reduced runtime from 20-30s (depending on the updates available) to 12s. I'm sure package objects are needed almost everywhere in yum but they cost.


I can see the following comment:

#  Note: If we are building the pkgobjlist, we don't exclude
# here, so that we can un-exclude later on ... if that matters.

Does that matters?

No, that comment needs to die. See the comment a couple of lines down
from it.

If we really take advantage of sqlite and build a query returning
exactly what we want, then why do we need to build separate python
PackageObject list?

I attach a patch which improves a lot the time needed for check-update
by avoiding to populate the YumSqlitePackageSack objects and by
calculating updates only using the (n,a,e,v,r) list
returned. _buildPkgObjList is not even used. For this simple case it
works so it makes me wonder...

What do you think? Is this preliminary patch in the right direction?
What do you propose for improving speed even further but not breaking
existing functionality?

Don't create returnPackageTuples() and change
PackageSack.simplePkgList(), just override simplePkgList() for
YumSqlitePackageSack().

You are of course right, thanks. I'll try to provide a patch soon.


The patch (and later versions) are incomplete, you are only
implementing include.match and exclude.match from the excluder API.
You don't implement the matching properly, as you are running the
GLOBs only on package names.
You don't implement include.match properly, the traditional behaviour
is that a package has to pass _both_ "includepkgs" and "exclude" not
either.

I have been struggling to understand the internals of yum so anything you point out is useful, thanks for all the tips.

That's fine as a proof of concept, but you didn't mark the patches as
being that.

Sorry for not being clear with that, I'll try to make it clear: My patches are a proof of concept, I am sure they break a lot of stuff and I don't expect them to be incorporated in yum (at least without many changes). I just expected to raise some discussion regarding performance.

The concept I'm trying to prove is moving more logic to SQL and reducing python iterative code wherever possible. Since you chose to use a database backend I think it's sensible to try to avoid python-level caching of package objects and just use the implicit caching done by sqlite.


I doubt you've tried many exclusions, as I'm pretty sure sqlite will
fail (which is why we have the limits like PATTERNS_INDEXED_MAX).

For simple excludes like '*python*' it works but you are right, I haven't tried many others.


You can't alter the .sqlite files as you've done in the last version
of your patch ... ie. temporary tables can't be used.

I think that .sqlite files are not being written at all, after all I have been testing yum as non-root.


You've not given any results:

  1. How long did the old SQL query take.
  2. How long does the new SQL query take.
  3. How long does the python pkgExcluder code take.
  4. What is 2 vs. 3 for small/large exclusions.

The following measurements directly from sqlite should answer most. Sorry for not having numbers from python right now:

$ sqlite3 /var/cache/yum/fedora/35d817e2bac701525fa72cec57387a2e3457bf32642adeee1e345cc180044c86-primary.sqlite
SQLite version 3.6.12
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .timer on
sqlite> create temp table excludedIds (pkgId text);
CPU Time: user 0.002999 sys 0.004999

sqlite> insert into excludedIds select pkgId from packages where name glob '*python*';
CPU Time: user 0.113982 sys 0.161976

sqlite> select count(*) from packages;
13289
CPU Time: user 0.003000 sys 0.018997

sqlite> select count(*) from packages where pkgid not in (select pkgid from excludedIds);
12825
CPU Time: user 0.211967 sys 0.149977


And for large exclusions:

sqlite> insert into excludedIds select pkgId from packages where name glob '*p*';
CPU Time: user 0.188971 sys 0.155977

sqlite> select count(*) from excludedIds;
6425
CPU Time: user 0.001000 sys 0.001000

sqlite> select count(*) from packages where pkgid not in (select pkgid from excludedIds);
7328
CPU Time: user 0.413937 sys 0.167974


About pkgExcluder, I am positive that it was an important slowdown inside _packageByKeyData() called from _buildPkgObjList(). I attach a profile I found, created with python line_profiler module.

...and as I said above, it'd be nice to know how much time is taken up
with just "package object creation" as against the select + python
exclude.

Also check-updates isn't the best thing to measure, as it's not that
simple (requiring all pkg data to be loaded) and apparently doesn't
require much more than the pkgtups for most of the data (maybe that's
true for update/install/etc. in general though).

I had also measured "yum update" performance where dependency resolving was by far the most expensive part (resolveDeps() and _checkFileRequires() in depsolve.py). I didn't mention it because I couldn't come out with some patch, it was way too complex for me how resolveDeps() works. So I decided to try a simpler case, that of "check-update", simpler but unfortunately not that simple indeed.

You might want to come onto IRC #yum on FreeNode to talk to us
tomorrow.

I'll try to be there.


Thanks for your help,
Dimitris


P.S. What do you think about rpmsack performance? Have you seen the other mail I sent with questions regarding its performance?


--
James Antill -- james@xxxxxxx
_______________________________________________
Yum mailing list
Yum@xxxxxxxxxxxxxxxxx
http://lists.baseurl.org/mailman/listinfo/yum
Timer unit: 1e-06 s

File: /home/jimis/dist/src/yum-git/yum/yum/sqlitesack.py
Function: _packageByKeyData at line 713
Total time: 3.14929 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   713                                               @profile
   714                                               def _packageByKeyData(self, repo, pkgKey, data, exclude=True):
   715                                                   """ Like _packageByKey() but we already have the data for .pc() """
   716     22080       730805     33.1     23.2          if exclude and self._pkgExcludedRKD(repo, pkgKey, data):
   717                                                       return None
   718     22080        77707      3.5      2.5          if repo not in self._key2pkg:
   719         4           13      3.2      0.0              self._key2pkg[repo] = {}
   720         4           14      3.5      0.0              self._pkgname2pkgkeys[repo] = {}
   721     22080        89619      4.1      2.8          if data['pkgKey'] not in self._key2pkg.get(repo, {}):
   722     22080      1864888     84.5     59.2              po = self.pc(repo, data)
   723     22080        98254      4.4      3.1              self._key2pkg[repo][pkgKey] = po
   724     22080        69708      3.2      2.2              self._pkgtup2pkgs.setdefault(po.pkgtup, []).append(po)
   725     22080       102270      4.6      3.2              pkgkeys = self._pkgname2pkgkeys[repo].setdefault(data['name'], [])
   726     22080        39156      1.8      1.2              pkgkeys.append(pkgKey)
   727     22080        76858      3.5      2.4          return self._key2pkg[repo][data['pkgKey']]

_______________________________________________
Yum mailing list
Yum@xxxxxxxxxxxxxxxxx
http://lists.baseurl.org/mailman/listinfo/yum

[Index of Archives]     [Fedora Users]     [Fedora Legacy List]     [Fedora Maintainers]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]

  Powered by Linux