Re: [PATCH] blame.c: don't drop origin blobs as eagerly

Jeff King <peff@xxxxxxxx> · Wed, 3 Apr 2019 07:36:05 -0400

On Wed, Apr 03, 2019 at 04:32:30PM +0700, Duy Nguyen wrote:

> That might explain why I could not see significant gain when blaming
> linux.git's MAINTAINERS file (0.5s was shaved out of 13s) even though
> the number of objects read was cut by half (8424 vs 15083).

I did a few timings, too, and managed to come up with similar
improvements (only a small fraction, and only for large files). I think
the main thing is simply that loading the blob from the object database
is a fraction of the total work done. We still have to actually diff the
blobs, which is at least as expensive as loading them from disk.

We also have to load commits and trees from disk as we traverse.
Enabling the commit-graph would shrink that portion (and make
improvements in the blob loading proportionally more impressive).

All that said, this seems like an easy and obvious win, and worth doing.
0.5s is still something.

I suspect we could do even better by storing and reusing not just the
original blob between diffs, but the intermediate diff state (i.e., the
hashes produced by xdl_prepare(), which should be usable between
multiple diffs). That's quite a bit more complex, though, and I imagine
would require some surgery to xdiff.

-Peff