Re: Git benchmark - comparison with Bazaar, Darcs, Git and Mercurial

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Tue, 31 Jul 2007 22:17:52 -0400

Jakub Narebski <jnareb@xxxxxxxxx> wrote:
> I have lately added new Git speed benchmark, from Bryan Murdock blog. 
> The repository is bit untypical:
> 
> <quote>  
>   By performance, I mean that I used the UNIX time command to see how
>   long various basic operations took. Performing the various basic
>   operations gave me some insight into the usability of each as well.
>   For this test I used a directory with 266 MB of files, 258 KB of which
>   were text files, with the rest being image files. I know, kind of
>   weird to version all those binary files, but that was the project I
>   was interested in testing this out on. Your mileage may vary and all
>   that. Here’s a table summarizing the real times reported by time(1):
> </quote>
> 
> If I remember correctly there were some patches to git which tried to 
> better deal with large blobs. In this simple benchmark git was 
> outperformed by Mercurial and even Bazaar-NG a bit.

Yes.  And we backed them out more recently.  :-(

A while ago someone had issues with large binary blobs being added to
the repository as loose objects (e.g. by git-add/git-update-index).
Repacking that repository (for just git-gc or for transport/clone)
was ugly as the large binary blob had to be deflated then
reinflated to encode it in the packfile.  The solution was the
core.legacyheaders = false configuration setting, which used
packfile encoding for loose objects, thereby allowing the packer
to just copy the already compressed data into the output packfile.

Unfortunately we backed that out recently to "simplify the code".
We can still read that loose object format, but we cannot create
it and during packing we don't copy the data (we deflate/inflate
anyway).  So we're back to the horrible deflate/inflate problem.
That probably explains the large clone time seen by the author.

I wonder if hg realizes that the two repositories are on the
same filesystem and automatically uses hardlinks if possible (aka
git clone -l).  That would easily explain how they can clone so
dang fast.  Maybe we should do the same in git-clone, its a pretty
simple thing to do.

I do have to question the author's timing method.  I don't know if
this was hot-cache or not, and he doesn't say.  I don't know if the
system was 100% idle when running these times, or the times were
averaged over a few runs.  Usually the first run of anything can
give inaccurate timings, as for example the executable code may
not be paged in from disk.  One of the tools may have had a bias
as maybe he poked around with that tool first, before starting the
timings, so its executables were still hot in cache.  Etc.

However assuming everything was actually done in a way that the
timings can be accurately relied upon...

Regarding the initial file import it looks like we about broke even
with bzr if you add the "initial file import" and "initial commit"
times together.  Remember we have to hash and compress the data
during git-add; bzr probably delayed their equivilant operation(s)
until the commit operation.  Summing these two times is probably
needed to really compare them.

We were also rather close to hg if you again sum the times up.
But we do appear to be slower, by about 27s.  I guess I find that
hard to believe, but sure, maybe hg somehow has a faster codepath
for their file revision disk IO than we do.  Maybe its because hg
is streaming data and we're loading it all in-core first; maybe the
author's system had to swap get enough virtual memory for git-add.
Maybe it is just because the author's testing methodology was not
very good and one or more of these numbers are just bunk.

Our merge time is pretty respectible giving the competition.
Its probably within the margin of error of the author's testing
methodology.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html