On Mon, Sep 16, 2013 at 05:55:36PM -0400, Jeff King wrote: > On Fri, Sep 13, 2013 at 12:09:35PM +0200, Josef Wolf wrote: > > > > > I'm not sure I understand correctly. I see that bitmaps can be used to > > > > implement set operations. But how comes that walking the graph requires a lot > > > > of CPU? Isn't it O(n)? > > > > > > Yes and no. Your "n" there is the entirety of history. > > > > Is this really true? > > Yes. If you know that the receiver has commit X, and you want to know if > it has some blob Y, the only way to know for sure is to look at every > tree of every commit reachable from X, and see whether any of them > references Y. Jeff, in my original example, I did a cherry-pick from origin/somebranch. Even without asking, we can assume with great probability that origin/somebranch is available at origin. And the file in question happens to reside in the tree at the very tip of origin/somebranch, not in some of its ancestors. In this case, there's no need to search the history at all. And even in this pretty simple case, the algorithm seems to fail for some reason. > Yes, you do not have to recurse into sub-trees (or commits) you have > already seen. And we already do that optimization. Why is the file re-transmitted, then? With a little change in the protocol, a very simple optimization could be implemented, avoiding the complicated bitmap strategy you were talking about: Please consider Junio's description of the dialog: [ Junio wrote: ] > Consider this simple history with only a handful of commits (as > usual, time flows from left to right): > > E > / > A---B---C---D > > where D is at the tip of the sending side, E is at the tip of the > receiving side. The exchange goes roughly like this: > > (receiving side): what do you have? > > (sending side): my tip is at D. > > (receiving side): D? I've never heard of it --- please give it > to me. I have E. > > (sending side): E? I don't know about it; must be something you > created since you forked from me. Tell me about > its ancestors. > > (receiving side): OK, I have C. > > (sending side): Oh, C I know about. You do not have to tell me > anything more. A packfile to bring you up to > date will follow. In the last step, instead of sending a packfile, the sending side should send a list of the SHA's which would be included in this packfile. The receiving side would then be able to request all the objects it needs to get up-to-date. I think this change would be considerably simpler than the reachability bitmap you are talking about. And it would avoid all those time consuming traversals through the history and the tree. And it would omit _all_ redundant retransmissions. Even in the case when sender and receiver do not have _any_ common heads at all, _no_ files at all would be retransmitted unnecessarily. -- Josef Wolf jw@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html