On Tue, Nov 28, 2023 at 02:07:54PM -0500, Taylor Blau wrote: > Performing verbatim pack reuse naturally trades off between CPU time and > the resulting pack size. In the above example, the single-pack reuse > case produces a clone size of ~194 MB on my machine, while the > multi-pack reuse case produces a clone size closer to ~266 MB, which is > a ~37% increase in clone size. Right, it's definitely a tradeoff. So taking a really big step back, there are a few optimizations all tied up in the verbatim reuse code: 1. in some cases we get to dump whole swaths of the on-disk packfile to the output, covering many objects with a few memcpy() calls. (This is still O(n), of course, but it's fewer instructions per object). 2. any other reused objects have only a small-ish amount of work to fix up ofs deltas, handle gaps, and so on. We get to skip adding them to the packing_list struct (this saves some CPU, but also a lot of memory) 3. we skip the delta search for these reused objects. This is where your big CPU / output size tradeoff comes into play, I'd think. So my question is: how much of what you're seeing is from (1) and (2), and how much is from (3)? Because there are other ways to trigger (3), such as lowering the window size. For example, if you try your same packing example with --window=0, how do the CPU and output size compare to the results of your series? (I'd also check peak memory usage). -Peff