On Tue, Dec 12, 2023 at 03:12:38AM -0500, Jeff King wrote: > So my question is: how much of what you're seeing is from (1) and (2), > and how much is from (3)? Because there are other ways to trigger (3), > such as lowering the window size. For example, if you try your same > packing example with --window=0, how do the CPU and output size compare > to the results of your series? (I'd also check peak memory usage). Interesting question! Here are some preliminary numbers on my machine (which runs Debian unstable with a Intel Xenon W-2255 CPU @ 3.70GHz and 64GB of RAM). I ran the following hyperfine command on my testing repository, which has the Git repository broken up into ~75 packs or so: $ hyperfine -L v single,multi -L window 0,10 \ --show-output \ -n '{v}-pack reuse, pack.window={window}' \ 'git.compile \ -c pack.allowPackReuse={v} \ -c pack.window={window} \ pack-objects --revs --stdout --use-bitmap-index --delta-base-offset <in 2>/dev/null | wc -c' Which gave the following results for runtime: Benchmark 1: single-pack reuse, pack.window=0 [...] Time (mean ± σ): 1.248 s ± 0.004 s [User: 1.160 s, System: 0.188 s] Range (min … max): 1.244 s … 1.259 s 10 runs Benchmark 2: multi-pack reuse, pack.window=0 [...] Time (mean ± σ): 1.075 s ± 0.005 s [User: 0.990 s, System: 0.188 s] Range (min … max): 1.071 s … 1.088 s 10 runs Benchmark 3: single-pack reuse, pack.window=10 [...] Time (mean ± σ): 6.281 s ± 0.024 s [User: 43.727 s, System: 0.492 s] Range (min … max): 6.252 s … 6.326 s 10 runs Benchmark 4: multi-pack reuse, pack.window=10 [...] Time (mean ± σ): 1.028 s ± 0.002 s [User: 1.150 s, System: 0.184 s] Range (min … max): 1.026 s … 1.032 s 10 runs Here are the average numbers for the resulting "clone" size in each of the above configurations: Benchmark 1: single-pack reuse, pack.window=0 264.443 MB Benchmark 2: multi-pack reuse, pack.window=0 268.670 MB Benchmark 3: single-pack reuse, pack.window=10 194.355 MB Benchmark 4: multi-pack reuse, pack.window=10 266.473 MB So it looks like setting pack.window=0 (with both single and multi-pack reuse) yields a similarly sized pack output as multi-pack reuse with any window setting. Running the same benchmark as above again, but this time sending the pack output to /dev/null and instead capturing the maximum RSS value from `/usr/bin/time -v` gives us the following (averages, in MB): Benchmark 1: single-pack reuse, pack.window=0 354.224 MB (max RSS) Benchmark 2: multi-pack reuse, pack.window=0 315.730 MB (max RSS) Benchmark 3: single-pack reuse, pack.window=10 470.651 MB (max RSS) Benchmark 4: multi-pack reuse, pack.window=10 328.786 MB (max RSS) So memory usage is similar between runs except for the single-pack reuse case with a window size of 10. It looks like the sweet spot is probably multi-pack reuse with a small-ish window size, which achieves the best of both worlds (small pack size, relative to other configurations that reuse large portions of the pack, and low memory usage). It's pretty close between multi-pack reuse with a window size of 0 and a window size of 10. If you want to optimize for pack size, you could trade a ~4% reduction in pack size for a ~1% increase in peak memory usage. Of course, YMMV depending on the repository, packing strategy, and pack.window configuration (among others) while packing. But this should give you a general idea of what to expect. Thanks, Taylor