commit-graph paranoia performance, was Re: [ANNOUNCE] Git v2.43.0-rc1

Jeff King <peff@xxxxxxxx> · Mon, 13 Nov 2023 15:55:38 -0500

On Thu, Nov 09, 2023 at 02:33:54AM +0900, Junio C Hamano wrote:

>  * The codepath to traverse the commit-graph learned to notice that a
>    commit is missing (e.g., corrupt repository lost an object), even
>    though it knows something about the commit (like its parents) from
>    what is in commit-graph.
>    (merge 7a5d604443 ps/do-not-trust-commit-graph-blindly-for-existence later to maint).

I happened to be timing "rev-list" for an unrelated topic today, and I
noticed that this change had a rather large effect. The commit message
for 7a5d604443 claims a 30% performance regression. But that's when
using "--topo-order", and actually writing out the result.

Running "rev-list --count" on a copy of linux.git with a fully-built
commit-graph shows that the run-time doubles:

  Benchmark 1: git.v2.42.1 rev-list --count HEAD
    Time (mean ± σ):     658.0 ms ±   5.2 ms    [User: 613.5 ms, System: 44.4 ms]
    Range (min … max):   650.2 ms … 666.0 ms    10 runs

  Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
    Time (mean ± σ):      1.333 s ±  0.019 s    [User: 1.263 s, System: 0.069 s]
    Range (min … max):    1.302 s …  1.361 s    10 runs

  Summary
    git.v2.42.1 rev-list --count HEAD ran
      2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD

Now in defense of that patch, this particular command is going to be one
of the most sensitive in terms of percent change, simply because it
isn't doing much besides walking the commits. And 650ms isn't _that_ big
in an absolute sense. But it also doesn't quite feel like nothing, even
tacked onto a command that might otherwise take 1000ms to run.

Should we default GIT_COMMIT_GRAPH_PARANOIA to "0"? Yes, some operations
might miss a breakage, but that is true of so much of Git. For day to
day commands we generally assume that the repository is not corrupted,
and avoid looking at any data we can. Other commands (like "commit-graph
verify", but maybe others) would probably want to be more careful
(either by checking this case explicitly, or by enabling the paranoia
flag themselves).

-Peff