Re: Using bit torrent to retrieve RPMs for updates

Konstantin Ryabitsev <icon@xxxxxxxxxxxxxx> · Thu, 26 Feb 2004 20:16:47 -0500

On 26.02.2004 16:42, Jonathan Gardner wrote:
Has anyone given serious thought to changing Yum so that it uses the 
bittorrent protocol to retrieve RPMs? Especially in the case of updates, 
when everyone and their grandmother needs to get the RPMs right away, this 
would make a lot of sense. Yum could manage a repository of RPMs and 
constantly serve those up so other can download parts of them via 
bittorrent, all with permission, of course.

We have pondered this solution many times here, but there are 
several important drawbacks:

1. Bittorrent is highly inefficient for a large collection of small 
files. You will have to start a separate tracker item for each rpm, 
and for some of them the amount of traffic generated just tracking 
the p2p clients will outweigh the savings of using bittorrent. I 
would imagine that several thousands of tracker items would also be 
quite processor-intensive.

2. You have to specifically punch holes in the firewall for 
bittorrent -- not one, but a range of ports, actually. Something 
most people will not do, so they will be constantly leeching.

3. Yum runs as root, so you suddenly have a very large amount of 
code (yum+bittorrent libs) listening as root for incoming 
connections. Yikes. Alternatively, you'd have to fork a downloader 
process and communicate with it using some methods. Either way is 
painful.

As you see, bittorrent is not very beneficial. However, a 
bittorent-like system used by *mirrors* could be of benefit. E.g. 
the client-side connects to the main server and says "I want 
foo-1.0-1.i386.rpm". The server then returns:

Checksum information for foo-1.0-1.i386.rpm:
bytes 0...10000: chksum1
bytes 10000...20000: chksum2
....
bytes n-10000...n: chksum n
The following servers claim to have it:
mirror.fooland.foo
mirror.barland.bar
....
mirror.bazland.baz
Go get it yourself.

The client then connects to the mirrors and fetches the ranges 
specified in the server response, thus creating a primitive swarm. 
The fetching can be done via http, ftp, and file as they all support 
fetching by byte range.

This would allow for auto-balancing the mirror load, though this 
solution is not without its own set of difficulties:

1. This still keeps thousands of trackers on the server, though 
having dedicated servers and limited tracker traffic compared to 
bittorent would theoretically be easier.

2. How to keep the list of mirrors current? Should they stay 
constantly connected to the main server a la bittorrent clients? 
Should they use some other bittorent-like protocol for syncing with 
each-other?

3. As tracker info per each package would be auto-generated, there's 
no way to sign it (this would require keeping key on the server, 
which is no-no). Attackers could potentially annoy a lot of people 
by publishing bogus mirror data pointing to odd places. Though this 
isn't really dangerous, as after all the final RPM fetched from 
various servers by bits and pieces would be still cryptographically 
signed.

This could be a fun project to play with, if anyone likes to mess 
with things like that. :)

--
Konstantin ("Icon") Ryabitsev
Duke Physics Systems Admin, RHCE
I am looking for a job in Canada!
http://linux.duke.edu/~icon/cajob.ptml