Re: [PATCH] list-objects: don't queue root trees unless revs->tree_objects is set

Jeff King <peff@xxxxxxxx> · Wed, 11 Sep 2019 21:19:52 -0400

On Wed, Sep 11, 2019 at 09:11:37PM -0400, Jeff King wrote:

> Let's try it after running "git commit-graph write":
> 
>   [before]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.458 s ±  0.011 s    [User: 1.199 s, System: 0.259 s]
>     Range (min … max):    1.447 s …  1.481 s    10 runs
> 
>   [after]
>   Benchmark #1: git -C linux rev-list HEAD >/dev/null
>     Time (mean ± σ):      1.126 s ±  0.023 s    [User: 896.5 ms, System: 229.0 ms]
>     Range (min … max):    1.106 s …  1.181 s    10 runs
> 
> Now that's more like it. We saved over 22% of the total time. Part of
> that is because the runtime is shorter overall, but the absolute
> improvement is also much larger. What's going on?

Another thing I noticed is that rev-list line-buffers when we're writing
to /dev/null. This is actually the doing of glibc's stdio, as it
consider the character device special enough to turn off full buffering
(we also do our own manual flush after each commit).

I think it's probably a fairer test to time it that way (quite often
you'd be writing to a pipe, which would have the same behavior). But our
improvement is even better as a percentage when writing to a file:

  [before]
  Benchmark #1: git -C linux rev-list HEAD >file
  Time (mean ± σ):      1.046 s ±  0.017 s    [User: 922.7 ms, System: 104.3 ms]
  Range (min … max):    1.031 s …  1.087 s    10 runs

  [after]
  Benchmark #1: git -C linux rev-list HEAD >file
  Time (mean ± σ):     741.4 ms ±  14.1 ms    [User: 644.8 ms, System: 75.9 ms]
  Range (min … max):   721.2 ms … 766.8 ms    10 runs

That's a 29% improvement instead of 22% (and shows that write() syscalls
are wasting close to 30% of our runtime, a well).

I wonder if it would be worth teaching rev-list a --buffer option. Or
just kicking it in automatically when we're just printing single oids.
Once upon a time the single-record flushing was useful for:

  git rev-list HEAD -- <pathspec> | git diff-tree ...

to feed incremental results as soon as we have them (imagine we see one
commit which touches the pathspec, then go through 100,000 that don't).
But these days "git log" does that at all internally (and typically
outputs quite a bit more between each flush, though one could argue that
"log --oneline" might want the same behavior).

I dunno. Maybe it's not worth micro-optimizing too hard, but I was
surprised how big a difference it made.

-Peff