Josef Wolf <jw@xxxxxxxxxxxxx> writes: > as we all know, files are identified by their SHA. Thus I had the impression > that when transfering files, git would know by the SHA whether a given file is > already available in the destination repository and the transfer would be of > no use. That is unfortunately not how things work. It is not like the receiving end sends the names of all objects it has, and the sending end excludes these objects from what it is going to send. Consider this simple history with only a handful of commits (as usual, time flows from left to right): E / A---B---C---D where D is at the tip of the sending side, E is at the tip of the receiving side. The exchange goes roughly like this: (receiving side): what do you have? (sending side): my tip is at D. (receiving side): D? I've never heard of it --- please give it to me. I have E. (sending side): E? I don't know about it; must be something you created since you forked from me. Tell me about its ancestors. (receiving side): OK, I have C. (sending side): Oh, C I know about. You do not have to tell me anything more. A packfile to bring you up to date will follow. At this point, the sender knows that the receiver needs the commit D, and trees and blobs in D. It does also know it has the commit C and trees and blobs in C. It does the best thing it can do using these (and only these) information, namely, to send the commit D, and send trees and blobs in D that are not in the commit C. You may happen to have something in E that match what is in D but not in C. Because the sender does not know anything about E at all in the first place, that information cannot be used to reduce the transfer. The sender theoretically _could_ also exploit the fact that any receiver that has C must have B and A and all trees and blobs associated with these ancestor commits [*1*], but that information is not currently discovered nor used during the object transfer. There may happen to be a tree or a blob in A that matches a tree or a blob in D. But because the common ancestor discovery exchange above stops at C, the sender does not bother enumerating all the objects that are in the ancestor commits of C when figuring out what objects to send to ensure that the receiving end has all the objects necessary to complete D. If you modified a blob at B (or C) and then resurrected the old version of the blob at D, it is likely that the blob is going to be sent again when the receiving end asks for D. There are some work being done to optimize this further using various techniques, but they are not ready yet. [Footnote] *1* only down to the shallow boundary, if the receiving end is a shallow clone. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html