Re: [PATCH 00/24] pack-objects: multi-pack verbatim reuse

Jeff King <peff@xxxxxxxx> · Thu, 21 Dec 2023 06:13:33 -0500

On Fri, Dec 15, 2023 at 10:37:57AM -0500, Taylor Blau wrote:

> On Tue, Dec 12, 2023 at 03:12:38AM -0500, Jeff King wrote:
> > So my question is: how much of what you're seeing is from (1) and (2),
> > and how much is from (3)? Because there are other ways to trigger (3),
> > such as lowering the window size. For example, if you try your same
> > packing example with --window=0, how do the CPU and output size compare
> > to the results of your series? (I'd also check peak memory usage).
> 
> Interesting question! Here are some preliminary numbers on my machine
> (which runs Debian unstable with a Intel Xenon W-2255 CPU @ 3.70GHz and
> 64GB of RAM).
> 
> I ran the following hyperfine command on my testing repository, which
> has the Git repository broken up into ~75 packs or so:

Thanks for running these tests. The results are similar to what
expected, which is: yes, most of your CPU savings are from skipping
deltas, but not all.

Here's what I see (which I think is mostly redundant with what you've
said, but I just want to lay out my line of thinking). I'll reorder your
quoted sections a bit as I go:

>     Benchmark 2: multi-pack reuse, pack.window=0
>     [...]
>       Time (mean ± σ):      1.075 s ±  0.005 s    [User: 0.990 s, System: 0.188 s]
>       Range (min … max):    1.071 s …  1.088 s    10 runs
>
>     Benchmark 4: multi-pack reuse, pack.window=10
>     [...]
>       Time (mean ± σ):      1.028 s ±  0.002 s    [User: 1.150 s, System: 0.184 s]
>       Range (min … max):    1.026 s …  1.032 s    10 runs

OK, so when we're doing more full ("multi") reuse, the pack window
doesn't make a big difference either way. You didn't show the stderr
from each, but presumably most of the objects are hitting the "reuse"
path, and only a few are deltas (and that is backed up by the fact that
doing deltas only gives us a slight improvement in the output size:

>     Benchmark 2: multi-pack reuse, pack.window=0
>     268.670 MB
>     Benchmark 4: multi-pack reuse, pack.window=10
>     266.473 MB

Comparing the runs with less reuse:

>     Benchmark 1: single-pack reuse, pack.window=0
>     [...]
>       Time (mean ± σ):      1.248 s ±  0.004 s    [User: 1.160 s, System: 0.188 s]
>       Range (min … max):    1.244 s …  1.259 s    10 runs
>
>     Benchmark 3: single-pack reuse, pack.window=10
>     [...]
>       Time (mean ± σ):      6.281 s ±  0.024 s    [User: 43.727 s, System: 0.492 s]
>       Range (min … max):    6.252 s …  6.326 s    10 runs

there obviously is a huge amount of time saved by not doing deltas, but
we pay for it with a much bigger pack:

>     Benchmark 1: single-pack reuse, pack.window=0
>     264.443 MB
>     Benchmark 3: single-pack reuse, pack.window=10
>     194.355 MB

But of course that "much bigger" pack is about the same size as the one
we get from doing multi-pack reuse. Which is not surprising, because
both are avoiding looking for new deltas (and the packs after the
preferred one probably have mediocre deltas).

So I do actually think that disabling pack.window gives you a
similar-ish tradeoff to expanding the pack-reuse code (~6s down to ~1s,
and a 36% embiggening of the resulting pack size).

Which implies that one option is to scrap your entire series and just
set pack.window. Basically comparing multi/10 (your patches) to single/0
(a hypothetical config option), which have similar run-times and pack
sizes.

But that's not quite the whole story. There is still a CPU improvement
in your series (1.2s vs 1.0s, a 20% speedup). And as I'd expect, a
memory improvement from avoiding the extra book-keeping (almost 10%):

>     Benchmark 1: single-pack reuse, pack.window=0
>     354.224 MB (max RSS)
>     Benchmark 4: multi-pack reuse, pack.window=10
>     328.786 MB (max RSS)

So while it's a lot less code to just set the window size, I do think
those improvements are worth it. And really, it's the same tradeoff we
make for the single-pack case (i.e., one could argue that we
could/should rip out the verbatim-reuse code entirely in favor of just
tweaking the window size).

> It's pretty close between multi-pack reuse with a window size of 0 and
> a window size of 10. If you want to optimize for pack size, you could
> trade a ~4% reduction in pack size for a ~1% increase in peak memory
> usage.

I think if you want to optimize for pack size, you should consider
repacking all-into-one to get better on-disk deltas. ;) I know that's
easier said than done when the I/O costs are significant. I do wonder if
storing thin packs on disk would let us more cheaply reach a state that
could serve optimal-ish packs without spending CPU computing bespoke
deltas for each client. But that's a much larger topic.

-Peff