On Thu, 18 Feb 2010, Michael Schultz wrote:
Hello all,
In my "spare time," I wrote a simple yum-plugin that uses the pycurl
library to download the RPM packages as a predownload hook. The goal
is to better use the client connection bandwidth by downloading multiple
packages at a time. I looked through the history of this and the dev list,
but only saw an attempt by someone hacking something into yum itself. This
was shot down, with the suggestion to write it as a plugin and make use of
pycurl to handle the downloads. I saw no follow-ups, so here I am.
From a high-level, this plugin uses pycurl and downloads all the packages
max_connections (16) at a time before the yum client processes downloads.
Once all packages are downloaded, control is returned to yum, which sees
that it has all the packages and begins the transactions as normal.
Right now the plugin is still in its infancy, so I am looking for quick
feedback. You can download the plugin from:
http://dev.beyond-syntax.com/yum/multithread.py
(for an example config file just replace .py with .conf). There are a few
things I would like to fix up, but haven't figured out how:
- Better output would be nice, but I'm not sure how to get there.
- It should integrate with fastestmirror (since it orders the repos), but
it would be nice to integrate with presto too, not sure how.
- Reordering downloads, starting biggest ones first and letting smaller
ones fill the download during that time.
- Distribute the downloads over more than just one mirror (if not already).
In my completely unscientific testing, I took a VM snapshot and ran yum
without presto and specified '--downloadonly'. The update was 94 packages,
totaling 102 Megabytes. (Both runs use the same mirror.)
- Multithread plugin downloaded all packages in 34 seconds.
- Normal yum downloaded all packages in 53 seconds.
Obviously, this is strongly dependent on the connection and server
capabilities, but I would assume any Fedora/RPM mirror will have
bandwidth >> client bandwidth. By creating more connections, multithread
consumes more bandwidth to speed up the overall throughput.
Questions, comments, observations?
Great!
This work notwithstanding - if you'd like to work on adding curlMulti()
support generally to urlgrabber - you'd be MOST welcome.
-sv
_______________________________________________
Yum mailing list
Yum@xxxxxxxxxxxxxxxxx
http://lists.baseurl.org/mailman/listinfo/yum