Re: Resumable clone/Gittorrent (again)

Luke Kenneth Casson Leighton <luke.leighton@xxxxxxxxx> · Wed, 5 Jan 2011 16:56:09 +0000

On Wed, Jan 5, 2011 at 4:23 PM, Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> wrote:
> Hi,
>
> I've been analyzing bittorrent protocol and come up with this. The
> last idea about a similar thing [1], gittorrent, was given by Nicolas.
> This keeps close to that idea (i.e the transfer protocol must be around git
> objects, not file chunks) with a bit difference.

> So it all looks good to me. It is resumable and verifiable. It can be
> fetched in parallel from many servers. It maps pretty good to
> BitTorrent (which means we can reuse BitTorrent design). All transfer
> should be compressed so the amount of transfer is also acceptable (not
> as optimized as upload-pack, but hopefully overhead is low). A minor
> point is latest commit (in full) will be available as soon as
> possible.

 ok.  what i wasn't aware of, about the bittorrent protocol, was that
multiple files, when placed into the same .torrent, are just
concatenated as one monolithic data block.  the "chunking" is just
then slapped on top of that.  the end result is that it's possible
that, if you want to get one specific file (or, in this case "object"
whether it be commit, blob or tree) then you *might* end up getting
two more "chunks" than you actually need, which, worst case could end
up being 2.999x the actual data really required.

 _this_ is the reason why people such as cameron dale criticised
bittorrent as a hierarchical / multi-file delivery mechanism (and
abandoned it in favour of apt-p2p), and i didn't understand this at
the time enough to be able to point out that i'd assumed they knew
what i was thinking :)

 what i was thinking was "duhh!" don't slap multiple files into a
single .torrent, put each file (or, in this case "object" whether it
be commit, blob, tree or other) into a separate torrent!  that's all -
problem goes away!

 now that of course leaves you with the problem that you now have
potentially hundreds if not thousands or tens of thousands of
.torrents to deal with, publish, find etc. etc.   and the solution to
_that_ is to give the name of the .torrent file something
meaningful.... like.... ooo, how about... the object's md5 sum? :)

 so _that_ problem's solved, which leaves just one more problem: how
to find such a ridiculously large number of objects in the first
place, and, surprise-surprise, there's a perfect solution to that, as
well, called DHTs.  and, surprise-surprise, what do you need as the
DHT key?  something like a 128-bit key?   ooo, how about ... the
object's md5 sum? that's 128-bit, that'll do :)

 but, better than that: there happens to have been announced very
recently an upgraded version of a bittorrent client, which claims to
be fantastic as it's no longer dependent on "internet search" sites,
because surprise-surprise, it uses peer-to-peer DHT to do the search
queries.

 so not only is there a solution to the problems but also there's even
a suitable codebase to work from in order to create a working
prototype.

 now i just have to find the damn thing... ah yes, it's called Tribler.

 l.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html