On Mon, May 30, 2011 at 10:36:27AM -0400, Jeff King wrote: > 1. Grab each blob, check binary-ness, and free. This double-loads in > the common, non-binary case. > [...] > > I'll try to take a look at it this week and get some measurements on (1) > versus (2) for both speed and peak memory usage. And then see if I can > do better with (3), and implement the "peek" solution both here and in > regular diff. I was curious about this, so I stole a few minutes to do some preliminary benchmarks this morning. The first thing to look at is the performance of the original code, that does not check binary-ness at all. It's going to represent the best we can do with any strategy. So I tried: git log -p --cc --merges origin/master on git.git using both v1.7.5.3 and the jk/combine-diff-binary-etc branch. And it turns out that the extra loads really don't make a difference in practice. My best-of-5 for the two cases were: $ time git.v1.7.5.3 log -p --cc --merges origin/master >/dev/null real 0m59.518s user 0m58.672s sys 0m0.688s $ time git.jk.binary-combined-diff log -p --cc \ --merges origin/master >/dev/null real 0m58.949s user 0m58.220s sys 0m0.572s The new code actually came out slightly faster. One reason may be that there are 3 combined diffs of git-gui/lib/git-gui.ico that we avoid doing (and just say "Binary files differ"). That's not a lot, but it gives us a very tiny edge (though that edge is very close to the amount of noise between runs). Still, I think it implies that the extra loads in the common non-binary case are not actually measurable. The peak memory use between the two should be the same (since we free each blob immediately), but I didn't measure it. So I think in practice it's not a big deal. I'll still take a look at the "peek" optimization later this week, since that can make a difference in some corner cases. And as part of that, it will probably make sense to keep the buffers around for small-ish files, so we'll get the optimization I mentioned more or less for free. I'll also do the check for duplicated sha1s that you mentioned. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html