On Sun, Sep 5, 2010 at 2:32 AM, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > On Sat, 4 Sep 2010, Luke Kenneth Casson Leighton wrote: > >> so, i believe that a much simpler algorithm is to follow nicolas' advice, and: >> >> * split up a pack-index file by its fanout (1st byte of SHAs in the idx) >> * create SHA1s of the list of object-refs within an individual fanout >> * compare the per-fanout SHA1s remote and local >> * if same, deduce "oh look, we have that per-fanout list already" >> * grab the per-fanout object-ref list using standard p2p filesharing >> >> in this way you'd end up breaking down e.g. 50mb of pack-index (for >> e.g. linux-2.6.git) into rouughly 200k chunks, and you'd exchange >> rouughly 50k of network traffic to find out that you'd got some of >> those fanout object-ref-lists already. which is nice. > > Scrap that idea -- this won't work. The problem is that, by nature, > SHA1 is totally random. So if you have, say, 256 objects to transfer > (and 256 objects is not that much) then, statistically, the probability > that the SHA1s for those objects end up uniformly distributed across all > the 256 fanouts is quite high. the algorithm I mentioned completely > breaks down in that case. mmm... that's no so baad. requesting a table/pseudo-file with 1 fanout or 256 fanouts is still only one extra round-trip. if i split it into pseudo-subdirectories _then_ yes you have 256 requests. that can be avoided with a bit of work. so, no biggie :) l. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html