Re: [PATCH 0/3] pack-bitmap: boundary-based bitmap traversal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/24/2023 8:00 PM, Taylor Blau wrote:
> Here is a short (but dense) series that I worked on towards the end of
> 2021 with Peff.
> 
> Its goal is to improve the bitmap traversal we implement in
> prepare_bitmap_walk(). Specifically, it avoids building up a complete
> bitmap of the "haves" side, and instead uses a combination of (a)
> UNINTERESTING commits between the tips and boundary, and (b) the
> boundary itself.
> 
> The gory details are laid out in 3/3, but the high-level overview of the
> new algorithm to compute the set of objects between "haves" and "wants"
> using bitmaps is:
> 
>   1. Build a (partial) bitmap of the haves side by first OR-ing any
>      bitmap(s) that already exist for UNINTERESTING commits between the
>      haves and the boundary.
> 
>   2. For each commit along the boundary, add it as a fill-in traversal
>      tip (where the traversal terminates once an existing bitmap is
>      found), and perform fill-in traversal.
> 
>   3. Build up a complete bitmap of the wants side as usual, stopping any
>      time we intersect the (partial) haves side.

In other words: this generates something closer to the object set in the
non-bitmapped object walk. The only difference is that the new bitmapped
algorithm will see objects that were re-introduced across the boundary
(say, a blob was reverted to its older mode).
 
> This improves many cases where using bitmaps was significantly *slower*
> than regular, non-bitmap traversal. In some instances, using bitmaps is
> still slower than the non-bitmap case, but it is a substantial
> improvement over the classic bitmap walk.> 
>     $ ours="$(git branch --show-current)"
>       argv="--count --objects $ours --not --exclude=$ours --branches"
>       hyperfine \
>         -n 'no bitmaps' "git.compile rev-list $argv" \
>         -n 'existing bitmap traversal' "git rev-list --use-bitmap-index $argv" \
>         -n 'this commit' "git.compile rev-list --use-bitmap-index $argv"
>     Benchmark 1: no bitmaps
>       Time (mean ± σ):      15.1 ms ±   4.1 ms    [User: 8.1 ms, System: 6.9 ms]
>       Range (min … max):     7.4 ms …  21.8 ms    131 runs
> 
>     Benchmark 2: existing bitmap traversal
>       Time (mean ± σ):      82.6 ms ±   9.2 ms    [User: 63.6 ms, System: 19.0 ms]
>       Range (min … max):    73.8 ms … 105.4 ms    28 runs
> 
>     Benchmark 3: this commit
>       Time (mean ± σ):      19.8 ms ±   3.1 ms    [User: 13.0 ms, System: 6.8 ms]
>       Range (min … max):    17.7 ms …  38.6 ms    158 runs
> 
>     Summary
>       'no bitmaps' ran
>         1.31 ± 0.41 times faster than 'this commit'
>         5.49 ± 1.62 times faster than 'existing bitmap traversal'
> 
> In a large repository on GitHub.com, the timings to compute the objects
> unique to "master", as in:
> 
>     $ git rev-list --count --objects master --not --exclude=master --branches
> 
> improve as follows:
> 
>   - 1.012 sec (without bitmaps)
>   - 29.571 sec (with the existing bitmap walk)
>   - 2.279 sec (with these patches)

For my curiosity, and since you already have a test environment set up,
could you redo the "without bitmaps" case with pack.useSparse true and
false? When the option was created and defaulted to true, we never
really explored comparing it to the bitmap case. In fact, I assumed the
bitmap case was faster in important cases like this (where there is a
substantial difference in object counts), but your data is surprising me
that the sparse algorithm is outperforming bitmaps, even with this new
algorithm.

The main question I'd like to ask is: is pack.useSparse contributing
in an important way to the performance here?
 
I'll go poking into the patches now.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux