On Wed, Jan 5, 2011 at 4:23 PM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote: > Hi, > > I've been analyzing bittorrent protocol and come up with this. The > last idea about a similar thing [1], gittorrent, was given by Nicolas. > This keeps close to that idea (i.e the transfer protocol must be around git > objects, not file chunks) with a bit difference. > So it all looks good to me. It is resumable and verifiable. It can be > fetched in parallel from many servers. It maps pretty good to > BitTorrent (which means we can reuse BitTorrent design). All transfer > should be compressed so the amount of transfer is also acceptable (not > as optimized as upload-pack, but hopefully overhead is low). A minor > point is latest commit (in full) will be available as soon as > possible. ok. what i wasn't aware of, about the bittorrent protocol, was that multiple files, when placed into the same .torrent, are just concatenated as one monolithic data block. the "chunking" is just then slapped on top of that. the end result is that it's possible that, if you want to get one specific file (or, in this case "object" whether it be commit, blob or tree) then you *might* end up getting two more "chunks" than you actually need, which, worst case could end up being 2.999x the actual data really required. _this_ is the reason why people such as cameron dale criticised bittorrent as a hierarchical / multi-file delivery mechanism (and abandoned it in favour of apt-p2p), and i didn't understand this at the time enough to be able to point out that i'd assumed they knew what i was thinking :) what i was thinking was "duhh!" don't slap multiple files into a single .torrent, put each file (or, in this case "object" whether it be commit, blob, tree or other) into a separate torrent! that's all - problem goes away! now that of course leaves you with the problem that you now have potentially hundreds if not thousands or tens of thousands of .torrents to deal with, publish, find etc. etc. and the solution to _that_ is to give the name of the .torrent file something meaningful.... like.... ooo, how about... the object's md5 sum? :) so _that_ problem's solved, which leaves just one more problem: how to find such a ridiculously large number of objects in the first place, and, surprise-surprise, there's a perfect solution to that, as well, called DHTs. and, surprise-surprise, what do you need as the DHT key? something like a 128-bit key? ooo, how about ... the object's md5 sum? that's 128-bit, that'll do :) but, better than that: there happens to have been announced very recently an upgraded version of a bittorrent client, which claims to be fantastic as it's no longer dependent on "internet search" sites, because surprise-surprise, it uses peer-to-peer DHT to do the search queries. so not only is there a solution to the problems but also there's even a suitable codebase to work from in order to create a working prototype. now i just have to find the damn thing... ah yes, it's called Tribler. l. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html