Martin Fick <mfick@xxxxxxxxxxxxxx> writes: > Sorry for the long winded rant. I suspect that some variation of all > my suggestions have already been suggested, but maybe they will > rekindle some older, now useful thoughts, or inspire some new ones. > And maybe some of these are better to pursue then more parallelism? We avoid doing a grand design document without having some prototype implementation, but I think the limitation of the current protocol has become apparent enough that we should do something about it, and we should do it in a way that different implementations of Git can all implement. I think "multi-threaded clone" is a wrong title for this discussion, in that the user does not care if it is done by multi-threading the current logic or in any other way. The user just wants a faster clone. In addition, the current "fetch" protocol has the following problems that limit us: - It is not easy to make it resumable, because we recompute every time. This is especially problematic for the initial fetch aka "clone" as we will be talking about a large transfer [*1*]. - The protocol extension has a fairly low length limit [*2*]. - Because the protocol exchange starts by the server side advertising all its refs, even when the fetcher is interested in a single ref, the initial overhead is nontrivial, especially when you are doing a small incremental update. The worst case is an auto-builder that polls every five minutes, even when there is no new commits to be fetched [*3*]. - Because we recompute every time, taking into account of what the fetcher has, in addition to what the fetcher obtained earlier from us in order to reduce the transferred bytes, the payload for incremental updates become tailor-made for each fetch and cannot be easily reused [*4*]. I'd like to see a new protocol that lets us overcome the above limitations (did I miss others? I am sure people can help here) sometime this year. [Footnotes] *1* The "first fetch this bundle from elsewhere and then come back here for incremental updates" raised earlier in this thread may be a way to alleviate this, as the large bundle can be served from a static file. *2* An earlier "this symbolic ref points at that concrete ref" attempt failed because of this and we only talk about HEAD. *3* A new "fetch" protocol must avoid this "one side blindly gives a large message as the first thing". I have been toying with the idea of making the fetcher talk first, by declaring "I am interested in your refs that match refs/heads/* or refs/tags/*, and I have a superset of objects that are reachable from the set of refs' values X you gave me earlier", where X is a small token generated by hashing the output from "git ls-remote $there refs/heads/* refs/tags/*". In the best case where the server understands what X is and has a cached pack data, it can then send: - differences in the refs that match the wildcards (e.g. "Back then at X I did not have refs/heads/next but now I do and it points at this commit. My refs/heads/master is now at that commit. I no longer have refs/heads/pu. Everything else in the refs/ hierarchy you are interested in is the same as state X"). - The new name of the state Y (again, the hashed value of the output from "git ls-remote $there refs/heads/* refs/tags/*") to make sure the above differences can be verified at the receiving end. - the cached pack data that contains all necessary objects between X and Y. Note that the above would work if and only if we accept that it is OK to send objects between the remote tracking branches the fetcher has (i.e. the objects it last fetched from the server) and the current tips of branches the server has, without optimizing by taking into account that some commits in that set may have already been obtained by the fetcher from a third-party. If the server does not recognize state X (after all it is just a SHA-1 hash value, so the server cannot recreate the set of refs and their values from it unless it remembers), the exchange would have to degenerate to the traditional transfer. The server would want to recognize the result of hashing an empty string, though. The fetcher is saying "I have nothing" in that case. *4* The scheme in *3* can be extended to bring the fetcher step-wise. If the server's state was X when the fetcher last contacted it, and since then the server received multiple pushes and has two snapshots of states, Y and Z, then the exchange may go like this: fetcher: I am interested in refs/heads/* and refs/tags/* and I have your state X. server: Here is the incremental difference to the refs and the end result should hash to Y. Here comes the pack data to bring you up to date. fetcher: (after receiving, unpacking and updating the remote-tracking refs) Thanks. Do you have more? server: Yes, here is the incremental difference to the refs and the end result should hash to Z. Here comes the pack data to bring you up to date. fetcher: (after receiving, unpacking and updating the remote-tracking refs) Thanks. Do you have more? server: No, you are now fully up to date with me. Bye. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html