Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > First of all: my main gripe with the discussed approach is that it uses > bundles. I know, I introduced bundles, but they just seem too klunky and > too static for the resumable clone feature. We should make the mechanism extensible so that we can later support multiple "alternate resource" formats, and "bundle" could be one of the options, my current thinking is that the initial version should use just a bare packfile to bootstramp, not a bundle. The format being "static" is both a feature and a practical compromise. It is a feature to allow clone traffic, which is a significant portion of the whole traffic to hosting sites, diverted off of the core server network for a busy hosting site, saving both networking and CPU cost. And that benefit will be felt even if the client has a good enough connection to the server that it does not have to worry about resuming. It is a practical compromise that the mechanism will not be extensible for helping incremental fetch but I heard that the server side statistics tells us that there aren't many "duplicate incremental fetch" requests (i.e. many clients having the same set of "have"s so that the server side can prepare, serve, and cache the same incremental pack, which can be served on a resumable transport, to help resuming clients by supporting partial/range requests), I do not think it is practical to try to use the same mechanism to help incremental and clone traffic. One size would not fit both here. I think a better approach to help incremental fetches is along the line of what was discussed in the discussion with Al Viro and others the other day. You'd need various building blocks implemented anew, including: - A protocol extension to allow the client to tell the server a list of "not necessarily connected" objects that it has, so that the server side can exclude them from the set of objects the traditional "have"-"ack" exchange would determine to be sent when building a pack. - A design of deciding what "list of objects" is worth sending to the server side. The total number of objects in the receiving end is an obvious upper bound, and it might be sufficient to send the whole thing as-is, but there may be more efficient way to determine this set [*1*] - A way to salvage objects from a truncated pack, as there is no such tool in core-git. [Footnote] *1* Once the traditional "have"-"ack" determines the set of objects the sender thinks the receiver may not have, we need to figure out the ones that happen to exist on the receiver end already, either because they were salvaged from a truncated pack data it received earlier, or perhaps because they already existed by fetching from a side branch (e.g. two repositories derived from the same upstream, updating from Linus's kernel tree by somebody who regularly interacts with linux-next tree), and exclude them from the set of objects sent from the sender. I've long felt that Eppstein's invertible bloom filter might be a good way to determine efficiently, among the set of objects the sending and the receiving ends have, which ones are common, but I didn't look into this deeply myself. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html