On Wed, 1 Aug 2007, Jakub Narebski wrote: > > If I remember correctly there were some patches to git which tried to > better deal with large blobs. In this simple benchmark git was > outperformed by Mercurial and even Bazaar-NG a bit. It's almost certainly not the binary blobs. I think almost all the difference is from the cloning, without repacking the souce or using a local clone. The default action for a git clone is to create a pack-file, and do a local clone as if you did it over the network. That is obviously much slower than using the "-l" flag for the _clone_ action, but it tends to be better for the end result - since you get a nice packed starting point, and none of the confusion with hardlinks etc. [ Maybe I'm just a worry-wart, but hardlinking two repos still makes me worried. Even though we never modify the object files. Quite frankly, I almost wish we hadn't ever done "-l" at all, and I cannot really suggest using it. Either use "-s" for the truly shared repository, or use the default pack-generating one. The hardlinking one was simple and made sense, but it's really not very nice. But that aversion to "git clone -l" is really totally illogical. The way we do the object handling, hardlinking object files in git is just about the most safe operation you can think of - and I *still* shudder at it ] Now, I think the "always act as if you were network transparent" by default is great, but especially if you have never run "git gc" to generate a pack to begin with, it's going to be a very costly thing. And I think that's what the numbers show. That's the only op we do a *lot* worse on than we should. (The "nonconflicting merge" is probably - once more - the diffstat generation that bites us. That's generally the most costly thing of the whole merge, but I *love* the diffstat). That said, even if he had done a "git gc", to be fair he would have had to include the cost of that first garbage collect in the "initial import", so the end result would have been exactly the same. Git _does_ end up having a very odd performance profile, and while it's optimized for certain thing, the "initial import" is not one of them. (Which admittedly is a bit odd. The reason I didn't ever seriously even consider monotone was that the initial import was so *incredibly* sucky, and took hours for the kernel. So use "-l" for benchmarks, and damn my "I hate hardlinking repos" idiocy). So the only way to truly do a fast initial import *and* get a reasonably good initial clone is likely one of: - take full advantage of git, and use local branches, instead of bothering with lots of clones. I think that this is often the right thing to do, but it's obviously not fair for comparisons, since it's really something different from what's likely available in the other SCM's. But it's the "git way". - use "git clone -s" (or "-l"). I think the hg numbers are the result of hg defaulting to "-l" behaviour. Which makes sense for hg, since people need to clone more (in git, you'd generally work with local branches instead). - or the initial import would be done with some "git fast-import" thing, rather than "git add ." We don't do it now, and the resulting pack-file wouldn't be optimal, but it would be reasonable. It would at least cut down a _bit_ on the clone cost. The other reaction I took away from that (quite reasonable, I think) comparison is that I think Murdock would have been much happier if git diff defaulted to "-C". We don't do that (for the best of reasons: interoperability), but maybe we should document the "-M/-C" options more. The options do show up in the man-page, but apparently not obviously enough, since he hadn't noticed. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html