Re: [PATCH v2 3/3] connected: implement connectivity check using bitmaps

Jeff King <peff@xxxxxxxx> · Wed, 30 Jun 2021 01:45:03 -0400

On Tue, Jun 29, 2021 at 11:07:47PM -0400, Taylor Blau wrote:

> On Tue, Jun 29, 2021 at 10:04:56PM -0400, Jeff King wrote:
> > In the warm-cache case, the improvement seems to go away (or maybe I'm
> > holding it wrong; even in the cold-cache case I don't get anywhere near
> > as impressive a speedup as you showed above). Which isn't to say that
> > cold-cache isn't sometimes important, and this may still be worth doing.
> > But it really seems like the CPU involve in walking over the file isn't
> > actually that much.
> 
> Hmm. I think that you might be holding it wrong, or at least I'm able to
> reproduce some substantial improvements in the warm cache case with
> limited traversals.

OK, I definitely was holding it wrong. It turns out that it helps to run
the custom version of Git when passing in the pack.writebitmapcomittable
option. (I regret there is no portable way to communicate a facepalm
image over plain text).

So that helped, but I did seem some other interesting bits.

Here's my cold-cache time for the commit you selected:

  $ export commit=2ab38c17aac10bf55ab3efde4c4db3893d8691d2
  $ hyperfine \
      -L table 0,1 \
      'GIT_READ_COMMIT_TABLE={table} git.compile rev-list --count --objects --use-bitmap-index $commit' \
      --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'

  Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):      1.420 s ±  0.162 s    [User: 196.1 ms, System: 293.7 ms]
    Range (min … max):    1.083 s …  1.540 s    10 runs

  Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):      1.319 s ±  0.256 s    [User: 171.1 ms, System: 237.1 ms]
    Range (min … max):    0.773 s …  1.588 s    10 runs

  Summary
    'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit' ran
      1.08 ± 0.24 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit'

So better, but well within the noise (and I had a few runs where it
actually performed worse). But you picked that commit because you knew
it was bitmapped, and it's not in my repo. If I switch to a commit that
is covered in my repo, then I get similar results to yours:

  $ export commit=9b1ea29bc0d7b94d420f96a0f4121403efc3dd85
  $ hyperfine \
        -L table 0,1 \
        'GIT_READ_COMMIT_TABLE={table} git.compile rev-list --count --objects --use-bitmap-index $commit' \
        --prepare='sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'

  Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):     309.3 ms ±  61.0 ms    [User: 19.4 ms, System: 79.0 ms]
    Range (min … max):   154.4 ms … 386.7 ms    10 runs

  Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):      33.7 ms ±   2.5 ms    [User: 3.3 ms, System: 3.6 ms]
    Range (min … max):    31.5 ms …  46.5 ms    33 runs

  Summary
  'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit' ran
    9.19 ± 1.94 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit'

And the effect continues in the warm cache case, though the absolute
numbers are much tinier:

  Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):      12.0 ms ±   0.3 ms    [User: 4.6 ms, System: 7.4 ms]
    Range (min … max):    11.4 ms …  13.2 ms    219 runs

  Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit
    Time (mean ± σ):       3.0 ms ±   0.4 ms    [User: 2.3 ms, System: 0.8 ms]
    Range (min … max):     2.6 ms …   5.5 ms    851 runs

  Summary
    'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $commit' ran
      3.95 ± 0.55 times faster than 'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $commit'

That implies to me that yes, it really is saving time, especially in the
cold-cache case. But if you have to do any actual fill-in traversal, the
benefits get totally lost in the noise. _Especially_ in the cold-cache
case, because then we're faulting in the actual object data, etc.

By the way, one other thing I noticed is that having a fully-build
commit-graph also made a big difference (much bigger than this patch),
especially for the non-bitmapped commit. Which makes sense, since it is
saving us from loading commit objects from disk during fill-in
traversal.

So I dunno. There's absolutely savings for some cases, but I suspect in
practice it's not going to really be noticeable. Part of me says "well,
if it ever provides a benefit and there isn't a downside, why not?". So
just devil's advocating on downsides for a moment:

  - there's some extra complexity in the file format and code to read
    and write these (and still fall back to the old system when they're
    absent). I don't think it's a deal-breaker, as it's really not that
    complicated a feature.

  - we're using extra bytes on disk (and the associated cold block cache
    effects there). It's not very many bytes, though (I guess 20 for the
    hash, plus a few offset integers; if we wanted to really
    penny-pinch, we could probably store 32-bit pointers to the hashes
    in the associated .idx file, at the cost of an extra level of
    indirection while binary searching). But that is only for a few
    hundred commits that are bitmapped, not all of them. And it's
    balanced by not needing to allocate a similar in-memory lookup table
    in each command. So it's probably a net win.

> > I got an even more curious result when adding in "--not --all" (which
> > the connectivity check under discussion would do). There the improvement
> > from your patch should be even less, because we'd end up reading most of
> > the bitmaps anyway. But I got:
> 
> Interesting. Discussion about that series aside, I go from 28.6ms
> without reading the table to 35.1ms reading it, which is much better in
> absolute timings, but much worse relatively speaking.

I suspect that's mostly noise. With "--not --all" in a regular linux.git
repo, I don't find any statistical difference.

In a fake repo with one ref per commit, everything is even more lost in
the noise (because we spend like 2000ms loading up all the tip commits).
I suspect it's worse on a real repo with lots of refs (the "spiky
branches" thing I mentioned earlier in the thread), since there we'd
have to do even more fill-in traversal.

> I can't quite seem to make sense of the perf report on that command.
> Most of the time is spent faulting pages in, but most of the time
> measured in the "git" object is in ubc_check. I don't really know how to
> interpret that, but I'd be curious if you had any thoughts.

ubc_check() is basically "computing sha1" (we check the sha1 on every
object we call parse_object() for). I'd guess it's just time spent
loading the negative tips (commits if you don't have a commit graph,
plus tags to peel, though I guess we should be using the packed-refs
peel optimization here).

-Peff