Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Wed, 1 Aug 2007, Jakub Narebski wrote:
> 
> If I remember correctly there were some patches to git which tried to 
> better deal with large blobs. In this simple benchmark git was 
> outperformed by Mercurial and even Bazaar-NG a bit.

It's almost certainly not the binary blobs.

I think almost all the difference is from the cloning, without repacking 
the souce or using a local clone.

The default action for a git clone is to create a pack-file, and do a 
local clone as if you did it over the network. That is obviously much 
slower than using the "-l" flag for the _clone_ action, but it tends to be 
better for the end result - since you get a nice packed starting point, 
and none of the confusion with hardlinks etc.

[ Maybe I'm just a worry-wart, but hardlinking two repos still makes me 
  worried. Even though we never modify the object files. 

  Quite frankly, I almost wish we hadn't ever done "-l" at all, and I 
  cannot really suggest using it. Either use "-s" for the truly shared 
  repository, or use the default pack-generating one. The hardlinking one 
  was simple and made sense, but it's really not very nice.

  But that aversion to "git clone -l" is really totally illogical. The way 
  we do the object handling, hardlinking object files in git is just about 
  the most safe operation you can think of - and I *still* shudder at it ]

Now, I think the "always act as if you were network transparent" by 
default is great, but especially if you have never run "git gc" to 
generate a pack to begin with, it's going to be a very costly thing. And I 
think that's what the numbers show. That's the only op we do a *lot* worse 
on than we should.

(The "nonconflicting merge" is probably - once more - the diffstat 
generation that bites us. That's generally the most costly thing of the 
whole merge, but I *love* the diffstat).

That said, even if he had done a "git gc", to be fair he would have had to 
include the cost of that first garbage collect in the "initial import", so 
the end result would have been exactly the same. Git _does_ end up having 
a very odd performance profile, and while it's optimized for certain 
thing, the "initial import" is not one of them.

(Which admittedly is a bit odd. The reason I didn't ever seriously even 
consider monotone was that the initial import was so *incredibly* sucky, 
and took hours for the kernel. So use "-l" for benchmarks, and damn my 
"I hate hardlinking repos" idiocy).

So the only way to truly do a fast initial import *and* get a reasonably 
good initial clone is likely one of:

 - take full advantage of git, and use local branches, instead of 
   bothering with lots of clones.

   I think that this is often the right thing to do, but it's obviously 
   not fair for comparisons, since it's really something different from 
   what's likely available in the other SCM's. But it's the "git way".

 - use "git clone -s" (or "-l").

   I think the hg numbers are the result of hg defaulting to "-l" 
   behaviour.  Which makes sense for hg, since people need to clone more 
   (in git, you'd generally work with local branches instead).

 - or the initial import would be done with some "git fast-import" thing, 
   rather than "git add ." We don't do it now, and the resulting pack-file 
   wouldn't be optimal, but it would be reasonable. It would at least cut 
   down a _bit_ on the clone cost.

The other reaction I took away from that (quite reasonable, I think) 
comparison is that I think Murdock would have been much happier if git 
diff defaulted to "-C". We don't do that (for the best of reasons: 
interoperability), but maybe we should document the "-M/-C" options more. 

The options do show up in the man-page, but apparently not 
obviously enough, since he hadn't noticed.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux