Jakub Narebski <jnareb@xxxxxxxxx> wrote: > I have lately added new Git speed benchmark, from Bryan Murdock blog. > The repository is bit untypical: > > <quote> > By performance, I mean that I used the UNIX time command to see how > long various basic operations took. Performing the various basic > operations gave me some insight into the usability of each as well. > For this test I used a directory with 266 MB of files, 258 KB of which > were text files, with the rest being image files. I know, kind of > weird to version all those binary files, but that was the project I > was interested in testing this out on. Your mileage may vary and all > that. Here’s a table summarizing the real times reported by time(1): > </quote> > > If I remember correctly there were some patches to git which tried to > better deal with large blobs. In this simple benchmark git was > outperformed by Mercurial and even Bazaar-NG a bit. Yes. And we backed them out more recently. :-( A while ago someone had issues with large binary blobs being added to the repository as loose objects (e.g. by git-add/git-update-index). Repacking that repository (for just git-gc or for transport/clone) was ugly as the large binary blob had to be deflated then reinflated to encode it in the packfile. The solution was the core.legacyheaders = false configuration setting, which used packfile encoding for loose objects, thereby allowing the packer to just copy the already compressed data into the output packfile. Unfortunately we backed that out recently to "simplify the code". We can still read that loose object format, but we cannot create it and during packing we don't copy the data (we deflate/inflate anyway). So we're back to the horrible deflate/inflate problem. That probably explains the large clone time seen by the author. I wonder if hg realizes that the two repositories are on the same filesystem and automatically uses hardlinks if possible (aka git clone -l). That would easily explain how they can clone so dang fast. Maybe we should do the same in git-clone, its a pretty simple thing to do. I do have to question the author's timing method. I don't know if this was hot-cache or not, and he doesn't say. I don't know if the system was 100% idle when running these times, or the times were averaged over a few runs. Usually the first run of anything can give inaccurate timings, as for example the executable code may not be paged in from disk. One of the tools may have had a bias as maybe he poked around with that tool first, before starting the timings, so its executables were still hot in cache. Etc. However assuming everything was actually done in a way that the timings can be accurately relied upon... Regarding the initial file import it looks like we about broke even with bzr if you add the "initial file import" and "initial commit" times together. Remember we have to hash and compress the data during git-add; bzr probably delayed their equivilant operation(s) until the commit operation. Summing these two times is probably needed to really compare them. We were also rather close to hg if you again sum the times up. But we do appear to be slower, by about 27s. I guess I find that hard to believe, but sure, maybe hg somehow has a faster codepath for their file revision disk IO than we do. Maybe its because hg is streaming data and we're loading it all in-core first; maybe the author's system had to swap get enough virtual memory for git-add. Maybe it is just because the author's testing methodology was not very good and one or more of these numbers are just bunk. Our merge time is pretty respectible giving the competition. Its probably within the margin of error of the author's testing methodology. -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html