yum performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello list,

I have been using fedora on various machines, many of which are fairly old, so I'm constantly trying to remove unnecessary fat and make things speedier. Unfortunately when the basic package manager is slow things aren't looking too good.

Running only "yum help" on an 800MHz PC with fedora 11 needs about 2.2s. Running "yum check-update" takes more than 20s to return an empty list. Many other yum commands are slow too, but I thought I should start with the simplest ones. All measurements are made after warming up, i.e. yum is already run once to bring its data into cache and update metadata info.

Perhaps I shouldn't even mention how yum (old version) slowness looks in an old sparcstation 5 running Aurora Linux. It needs hours for performing operations and is constantly swapping. It is the most important obstacle for using that distro on such machinery.


So I've been doing some profiling on yum.
As far as "yum help" is concerned, I haven't reached any important conclusions. Most time is consumed in ini-parsing, URL parsing and python module initialisations. Really way too much diverse stuff to try and improve something. FYI functions to look into are getReposFromConfig@xxxxxxxxxxx and readStartupConfig@xxxxxxxxx and object initialisations (__init__.py?) in general.

As far as check-update goes, _buildPkgObjList@xxxxxxxxxxxxx takes by far the most time. The current way it works is by doing one query to sqlite returning all packages, and then manually parsing the result for excludes and converting it to python objects, all done with repetitive python code.

Is there a reason for not using a proper SQL query for returning all packages needed, excluding excludes? I can see the following comment:

#  Note: If we are building the pkgobjlist, we don't exclude
# here, so that we can un-exclude later on ... if that matters.

Does that matters?

If we really take advantage of sqlite and build a query returning exactly what we want, then why do we need to build separate python PackageObject list?

I attach a patch which improves a lot the time needed for check-update by avoiding to populate the YumSqlitePackageSack objects and by calculating updates only using the (n,a,e,v,r) list returned. _buildPkgObjList is not even used. For this simple case it works so it makes me wonder...

What do you think? Is this preliminary patch in the right direction? What do you propose for improving speed even further but not breaking existing functionality?


Thanks in advance,
Dimitris
diff --git a/yum/packageSack.py b/yum/packageSack.py
index b71356a..1f574b8 100644
--- a/yum/packageSack.py
+++ b/yum/packageSack.py
@@ -921,9 +921,8 @@ class PackageSack(PackageSackBase):
         """returns a list of pkg tuples (n, a, e, v, r) optionally from a single repoid"""
         
         # Don't cache due to excludes
-        return [pkg.pkgtup for pkg in self.returnPackages(patterns=patterns,
-                                                          ignore_case=False)]
-                       
+        return self.returnPackageTuples(patterns=patterns, ignore_case=False)
+
     def printPackages(self):
         for pkg in self.returnPackages():
             print pkg
diff --git a/yum/sqlitesack.py b/yum/sqlitesack.py
index 643f1f6..73ce15f 100644
--- a/yum/sqlitesack.py
+++ b/yum/sqlitesack.py
@@ -1512,7 +1512,85 @@ class YumSqlitePackageSack(yumRepo.YumPackageSack):
             self._pkgnames_loaded.update([po.name for po in returnList])
 
         return returnList
-                
+
+    def returnPackageTuples(self, ignore_case=False, patterns=None):
+        """Returns a list of n,a,e,v,r tuples with all packages minus excludes
+        """
+        
+        # TODO: sqlite GLOB is case sensitive so even though it's handy because of 
+        #	its wildcars, perhaps we should use LIKE and transform wildcards
+        def buildQuery():
+            """Build a query in the following form:
+
+SELECT name, arch, epoch, version, release FROM packages
+WHERE NOT  # NOT because the following lines give 
+           # excluded packages, but we want the opposite
+	NOT
+	(pkgName GLOB self._pkgExcluder[i][2].lower() 
+		(only if self._pkgExcluder[i][1]=="include.match")
+	)
+	AND
+	(
+		(repo = self._excludes[i][0] AND
+		pkgKey = self._excludes[i][1])
+		OR
+		repo IN (self._all_excludes[i])
+		OR
+		arch NOT IN (self._arch_allowed[i])
+		OR
+		(pkgName GLOB self._pkgExcluder[i][2].lower() 
+			(only if self._pkgExcluder[i][1]=="exclude.match")
+		)
+	)"""
+
+            import itertools
+
+            incl_vars= [ i[2].lower() for i in self._pkgExcluder if i[1]=="include.match" ]
+            incl_q1= " OR ".join( [" (name GLOB '?') "] * len(incl_vars) )
+            
+            excl_L=[]
+            # itertools.chain seems the most elegant way to flatten a nested list
+            excl_vars1= list(itertools.chain(*self._excludes))
+            excl_q1= " OR ".join( [" (repo = ? AND pkgKey = ?) "] * (len(excl_vars1)/2) )
+            if len(excl_vars1)>0:
+                excl_L+= [excl_q1]
+            excl_vars2= list(self._all_excludes)
+            excl_q2= "repo IN (" + ",".join( ["?"] * len(excl_vars2)  ) + ")"
+            if len(excl_vars2)>0:
+                excl_L+= [excl_q2]
+            excl_vars3= list(self._arch_allowed)
+            excl_q3= "arch NOT IN (" + ",".join( ["?"] * len(self._arch_allowed) ) + ")"
+            if len(excl_vars3)>0:
+                excl_L+= [excl_q3]
+            excl_vars4= [ i[2].lower() for i in self._pkgExcluder if i[1]=="exclude.match" ]
+            excl_q4= " OR ".join( [" (name GLOB ?) "] * len(excl_vars4) )
+            if len(excl_vars4)>0:
+                excl_L+= [excl_q4]
+            excl_q= " OR ".join(excl_L)
+            excl_vars= excl_vars1 + excl_vars2 + excl_vars3 + excl_vars4
+
+            q="SELECT name, arch, epoch, version, release FROM packages"
+            if len(incl_vars)>0 or len(excl_vars)>0:
+                q+= " WHERE NOT "
+                if len(incl_vars)>0:
+                    q+= " NOT (" + incl_q1 + ")"
+                    if len(excl_vars)>0:
+                        q+= " AND "
+                if len(excl_vars)>0:
+                    q+= "(" + excl_q + ")"
+
+            return q, incl_vars+excl_vars
+
+        returnList=[]
+        (q,v)= buildQuery()
+        for (repo,cache) in self.primarydb.items():
+            print repo, q, v
+            cur = cache.execute(q, v)
+            returnList.extend(cur.fetchall())
+        return [tuple(i) for i in returnList]
+            
+        
+
     def returnPackages(self, repoid=None, patterns=None, ignore_case=False):
         """Returns a list of packages, only containing nevra information. The
            packages are processed for excludes. Note that patterns is just
_______________________________________________
Yum mailing list
Yum@xxxxxxxxxxxxxxxxx
http://lists.baseurl.org/mailman/listinfo/yum

[Index of Archives]     [Fedora Users]     [Fedora Legacy List]     [Fedora Maintainers]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]

  Powered by Linux