Re: git pack/unpack over bittorrent - works!

Luke Kenneth Casson Leighton <luke.leighton@xxxxxxxxx> · Sun, 5 Sep 2010 18:16:26 +0100

On Sun, Sep 5, 2010 at 2:32 AM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote:
> On Sat, 4 Sep 2010, Luke Kenneth Casson Leighton wrote:
>
>> so, i believe that a much simpler algorithm is to follow nicolas' advice, and:
>>
>> * split up a pack-index file by its fanout (1st byte of SHAs in the idx)
>> * create SHA1s of the list of object-refs within an individual fanout
>> * compare the per-fanout SHA1s remote and local
>> * if same, deduce "oh look, we have that per-fanout list already"
>> * grab the per-fanout object-ref list using standard p2p filesharing
>>
>> in this way you'd end up breaking down e.g. 50mb of pack-index (for
>> e.g. linux-2.6.git) into rouughly 200k chunks, and you'd exchange
>> rouughly 50k of network traffic to find out that you'd got some of
>> those fanout object-ref-lists already.  which is nice.
>
> Scrap that idea -- this won't work.  The problem is that, by nature,
> SHA1 is totally random.  So if you have, say, 256 objects to transfer
> (and 256 objects is not that much) then, statistically, the probability
> that the SHA1s for those objects end up uniformly distributed across all
> the 256 fanouts is quite high.  the algorithm I mentioned completely
> breaks down in that case.

 mmm... that's no so baad.  requesting a table/pseudo-file with 1
fanout or 256 fanouts is still only one extra round-trip.  if i split
it into pseudo-subdirectories _then_ yes you have 256 requests.  that
can be avoided with a bit of work.  so, no biggie :)

l.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html