On Sun, Jul 17, 2011 at 3:18 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: > > What about `git clone`? We're always recomputing the entire DAG > during it. For a public repository like yours that only contains > public objects, its a horrible abuse of the servers that are serving > the repository... > > Just saying, not everything we do winds up being a partial or > incomplete traversal in the name of performance. I don't see your point. OF COURSE we sometimes traverse the whole tree - when we need all the data. And it's expensive in those cases, but generally those cases are also cases where the DAG traversal itself is just a tiny part of the big picture. The commits tend to be almost irrelevant to "git clone", for example: it tends to be tree and blob objects that are the biggest cost. But there's a lot of common operations that would be much too expensive unless we had the incomplete DAG traversal code. It's what makes us able to do sub-second merges, it's what makes "gitk @{6am}.." be fast, etc etc. My point really was that the git DAG structure is really simple. People learn about DAG's in CS courses the first year. But the kinds of things that git does, which is to try to partition the DAG without having to walk it entirely - that's rare. I tried to find papers about optimized DAG walking, and couldn't (but so many academic papers are behind a pay-wall that I still don't know if there might be some smart person who came up with a really good algorithm for what the git-merge-base stuff does, for example) Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html