On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > Alas, there they are. They are basically no different than having the > > name-hash for single pack bitmaps, it's just now we don't throw them > > away when generating a MIDX bitmap from a state where the repository > > already has a single-pack bitmap. > > I actually wasn't expecting any CPU/time difference. I think it is possible to see the CPU usage go down without affecting the resulting pack size. See below for a more detailed analysis. > I hope that we are talking about the same name-hash, which is used > to sort the blobs so that when pack-objects try to find a good delta > base, the blobs from the same path will sit close to each other and > hopefully fit in the pack window. Yes, of course. > The effect I was hoping to see by not discarding the information was > that we find better delta base hence smaller deltas in the resulting > packfiles. I think it is possible to observe either a decrease in CPU or a decrease in the resulting pack size. In my experience having the name-hash filled in results in finding good delta pairs much more quickly than without, but that in many repositories the resulting pack size is basically the same. In other words, the resulting pack is pretty similar whether you use the name-hash or not, it just affects how quickly you get there. Some experiments to back that up: I instrumented the existing p5326 by replacing anything like "pack-objects ... --stdout >/dev/null" with "pack-objects ... --stdout >pack.tmp" and then added test_size's to measure the size of each pack. On the tip of this branch, the results are: Test origin/tb/multi-pack-bitmaps HEAD ---------------------------------------------------------------------------- 5326.5: simulated clone size 3.3G 3.3G +0.0% 5326.7: simulated fetch size 10.5M 10.5M -0.2% 5326.21: clone (partial bitmap) 3.3G 3.3G +0.0% Looking at c171d3e677 (pack-bitmap: implement optional name_hash cache, 2013-10-22), I modified[1] that script to replace timing pack-objects with counting the number of bytes it wrote. Doing that shows that the name-hash doesn't make a substantial difference in the resulting pack size (numbers on a recent-ish copy of the kernel): Test c171d3e677d777c50231d8dea32ae691936da819^ c171d3e677d777c50231d8dea32ae691936da819 -------------------------------------------------------------------------------------------------------------- 9999.3: simulated clone 3.2G 3.2G +0.0% 9999.4: simulated fetch 32 32 +0.0% 9999.6: partial bitmap 3.1G 3.1G +0.0% (As a mostly-unrelated aside, I was curious why the pack size jumped from 3.2GB to 3.3GB, but I can reproduce that jump even in p5310--the single pack bitmap test--on the tip of my branch. So it does appear to be a regression which I'll look into, but it's unrelated to this branch or MIDX bitmaps). Thanks, Taylor [1]: https://gist.github.com/ttaylorr/6cfa3eb9fd012f81b833873d50f96f71