Re: [PATCH 1/1] commit-graph.c: avoid unnecessary tag dereference when merging

Jeff King <peff@xxxxxxxx> · Sat, 21 Mar 2020 01:00:25 -0400

On Fri, Mar 20, 2020 at 09:44:23PM -0600, Taylor Blau wrote:

> In my testing environment, this improves the time to "merge" a split
> commit-graph containing all reachable commits in the kernel by
> re-writing the same commit-graph (effectively measuring the time it
> takes to check that all of those commits still exist) from:
> 
>   Attempt 1: 9.614
>   Attempt 2: 10.984
>   Attempt 3: 10.39
>   Attempt 4: 9.14
>   Attempt 5: 9.439
> 
>   real	0m9.140s
>   user	0m8.207s
>   sys	0m0.602s
> 
> to:
> 
>   Attempt 1: 9.12
>   Attempt 2: 8.904
>   Attempt 3: 9.361
>   Attempt 4: 9.288
>   Attempt 5: 9.677
> 
>   real	0m8.904s
>   user	0m8.208s
>   sys	0m0.596s
> 
> yielding a modest ~2.6% improvement in the best timings from each run,
> and ~7.4% improvement on average.

That still seems really slow to me. If we were truly eliding the load of
most of the commit objects, I'd expect an order of magnitude or so
improvement. For example, with a fully constructed commit graph in
linux.git, I get:

  $ time git -c core.commitGraph=1 rev-list HEAD | wc -l
  886922

  real	0m1.088s
  user	0m0.659s
  sys	0m1.161s

  $ time git -c core.commitGraph=0 rev-list HEAD | wc -l
  886922

  real	0m7.185s
  user	0m6.729s
  sys	0m1.882s

Obviously not the same operation, but that should give us a rough idea
that commit graph lookups are 6-7 times cheaper than loading the actual
objects. I don't remember the details of the case that originally led us
towards this patch. Can you share more of the setup you used to generate
the numbers above (which repo, but more importantly the commands to
create the initial state and then time the test).

The patch otherwise still makes sense to me, but I suspect there are
other similar optimizations nearby that we'll need to do in tandem.

-Peff