[PATCH 0/3] pack-bitmap: boundary-based bitmap traversal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Here is a short (but dense) series that I worked on towards the end of
2021 with Peff.

Its goal is to improve the bitmap traversal we implement in
prepare_bitmap_walk(). Specifically, it avoids building up a complete
bitmap of the "haves" side, and instead uses a combination of (a)
UNINTERESTING commits between the tips and boundary, and (b) the
boundary itself.

The gory details are laid out in 3/3, but the high-level overview of the
new algorithm to compute the set of objects between "haves" and "wants"
using bitmaps is:

  1. Build a (partial) bitmap of the haves side by first OR-ing any
     bitmap(s) that already exist for UNINTERESTING commits between the
     haves and the boundary.

  2. For each commit along the boundary, add it as a fill-in traversal
     tip (where the traversal terminates once an existing bitmap is
     found), and perform fill-in traversal.

  3. Build up a complete bitmap of the wants side as usual, stopping any
     time we intersect the (partial) haves side.

This improves many cases where using bitmaps was significantly *slower*
than regular, non-bitmap traversal. In some instances, using bitmaps is
still slower than the non-bitmap case, but it is a substantial
improvement over the classic bitmap walk.

    $ ours="$(git branch --show-current)"
      argv="--count --objects $ours --not --exclude=$ours --branches"
      hyperfine \
        -n 'no bitmaps' "git.compile rev-list $argv" \
        -n 'existing bitmap traversal' "git rev-list --use-bitmap-index $argv" \
        -n 'this commit' "git.compile rev-list --use-bitmap-index $argv"
    Benchmark 1: no bitmaps
      Time (mean ± σ):      15.1 ms ±   4.1 ms    [User: 8.1 ms, System: 6.9 ms]
      Range (min … max):     7.4 ms …  21.8 ms    131 runs

    Benchmark 2: existing bitmap traversal
      Time (mean ± σ):      82.6 ms ±   9.2 ms    [User: 63.6 ms, System: 19.0 ms]
      Range (min … max):    73.8 ms … 105.4 ms    28 runs

    Benchmark 3: this commit
      Time (mean ± σ):      19.8 ms ±   3.1 ms    [User: 13.0 ms, System: 6.8 ms]
      Range (min … max):    17.7 ms …  38.6 ms    158 runs

    Summary
      'no bitmaps' ran
        1.31 ± 0.41 times faster than 'this commit'
        5.49 ± 1.62 times faster than 'existing bitmap traversal'

In a large repository on GitHub.com, the timings to compute the objects
unique to "master", as in:

    $ git rev-list --count --objects master --not --exclude=master --branches

improve as follows:

  - 1.012 sec (without bitmaps)
  - 29.571 sec (with the existing bitmap walk)
  - 2.279 sec (with these patches)

The first couple of patches are preparatory, and the third patch is the
substantive one.

Please have a look and poke any holes you can find in the new approach
:-). Thanks in advance for your review.

Taylor Blau (3):
  revision: support tracking uninteresting commits
  pack-bitmap.c: extract `fill_in_bitmap()`
  pack-bitmap.c: use commit boundary during bitmap traversal

 pack-bitmap.c | 252 ++++++++++++++++++++++++++++++++++++++------------
 revision.c    |  10 +-
 revision.h    |   5 +
 3 files changed, 205 insertions(+), 62 deletions(-)

-- 
2.40.0.380.gda896aa358.dirty



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux