[Yum] Yum and Bittorrent

jarito at lehigh.edu (Jarret R) · Thu Oct 21 08:43:44 2004

Konstantin Ryabitsev wrote:
> Well, there is another issue which your method does not address. If you 
> only torrent large files (say, over 50 Mb in size), then the vast 
> majority of packages is still downloaded from the server. What bogs down 
> large mirrors is not bandwidth -- we have it out the proverbial wazoo. 
> What bogs us down is processor and io load. If most requests are still 
> made to the servers (plus there is additional tracker load), does that 
> actually improve the situation?

This is part of what we want to test. We are assuming that the BT 
overhead is to high for smaller files because of the design decisions, 
but we can't quantify it. A .torrent file will almost always be smaller 
than the file that it defines. If we can reduce the upload burden of a 
server from 1 copy of the file for every downloader to the torrent file 
+ maybe 5 copies for every 100 peers, I think it will have a serious impact.

Like I've said. We are planning this public beta to flesh out these 
issues and see what kind of an impact we can have.

> I still think we're chasing the wrong goose trying to unite yum and 
> bittorrent, and it is better to have a bittorrent-like protocol for 
> syncing mirrors and a special download method for urlgrabber that would 
> grab byte-ranges from multiple servers. This would allow the mirror load 
> to drop significantly, thus improving the situation when half the world 
> sends us HTTP requests.

I assume that most mirrors use rsync as it is, so I am not sure how 
useful switching to BT would be, but it is something we will look into. 
Another point to remember is that the server doesn't have to be the 
tracker or the only initial seed.

A tracker requires little bandwidth and could be run by a smaller 
mirror, like my university (Lehigh University). We don't have the 
bandwitdh to run a full mirror, but we could run a tracker. Then as many 
repositories as we can get can start the inital seeding, they already 
have the file. When a new peer enters the swarm, it will contact all the 
repsoitories and request pieces of the file, which would reduce overall 
load on the server dramatically. This scenario assumes a lot, but I 
think if the system is viable, we will start to see things like this.

> But I would like to see and play with your work. There are some 
> interesting points there.

Thanks. I will post here when we are ready to open the beta to the 
public. Also, you mentioned that you run a mirror. One thing that would 
help us is to get our hands on a download log for a real mirror. This 
would allow us to make a more educated guess on which files should be 
served via BT instead of just using size as the deciding factor.

PS - There are several new features coming down the pike for BitTorrent 
that could significantly impact this system.

(1) Multi-Tracker specification - this would allow us to specify more 
than one tracker for each torrent. If the first went down, the 
downloader would just switch to the next one.

(2) Peer Sharing - Two trackers running the same torrent can share 
peers. This will allow larger, longer lasting swarms.

(3) Torrent File Sharing - Instead of a BT Instance for every file being 
shared via BT, we start one up for all the files we want to share via 
bt. Then the downloader only requests pieces from the files that it 
wants. This should reduce the load on the server and just maintain one 
swarm. We are looking into this one, but we don't have anything yet.

Thanks,
Jarret Raim