On Thu, 2009-08-20 at 09:37 +0200, Jakub Narebski wrote: > You would have the same (or at least quite similar) problems with > deepening part (the 'incrementals' transfer part) as you found with my > first proposal of server bisection / division of rev-list, and serving > 1/Nth of revisions (where N is selected so packfile is reasonable) to > client as incrementals. Yours is top-down, mine was bottom-up approach > to sending series of smaller packs. The problem is how to select size > of incrementals, and that incrementals are all-or-nothing (but see also > comment below). I've defined a way to do this which doesn't have the complexity of bisect in GitTorrent, making the compromise that you can't guarantee each chunk is exactly the same size... I'll have a crack at doing it based on the rev-cache code in C instead of the horrendously slow Perl/Berkeley solution I have at the moment to see how well it fares. > Another solution would be to try to come up with some sort of stable > sorting of objects so that packfile generated for the same parameters > (endpoints) would be always byte-for-byte the same. But that might be > difficult, or even impossible. delta compression is not repeatable enough for this. This was an assumption made by the first version of GitTorrent, that this would be an appropriate solution. So, first you have to sort the objects - that's fine, --date-order is a good starting point, then I reasoned that interleaving new objects for each commit with commit objects would be a useful sort order. You also need to tie-break for commits with the same commit date; I just used the SHA-1 of the commit for that. Finally, when making packs to avoid excessive transfer you have to try to make sure that they are "thin" packs. Currently, thin packs can only work starting at the beginning of history and working forward, which is opposite to what happens most of the time in packs. I think this is the source of much of the inefficiency caused by chopping up the object lists mentioned in my other e-mail. It might be possible, if you could also know which earlier objects were using this object as a delta base, to try delta'ing against all those objects and see which one results in the smallest delta. > Well, we could send the list of objects in pack in order used later by > pack creation to client (non-resumable but small part), and if packfile > transport was interrupted in the middle client would compare list of > complete objects in part of packfile against this manifest, and sent > request to server with *sorted* list of object it doesn't have yet. > Server would probably have to check validity of objects list first (the > object list might be needed to be more than just object list; it might > need to specify topology of deltas, i.e. which objects are base for which > ones). Then it would generate rest of packfile. Mmm. It's a bit chatty, that. Object lists add another 10-20% on, which I think should be avoidable if the thin pack problem, plus the problem of some objects ending up in more than one of the thin packs that are created, should be reduced to very little. Sam -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html