Jeff King <peff@xxxxxxxx> writes: > On Mon, Sep 13, 2021 at 07:05:32PM -0700, Junio C Hamano wrote: > >> Taylor Blau <me@xxxxxxxxxxxx> writes: >> >> > Alas, there they are. They are basically no different than having the >> > name-hash for single pack bitmaps, it's just now we don't throw them >> > away when generating a MIDX bitmap from a state where the repository >> > already has a single-pack bitmap. >> >> I actually wasn't expecting any CPU/time difference. > > I was, for the same reason we saw an improvement there in ae4f07fbcc > (pack-bitmap: implement optional name_hash cache, 2013-12-21): without a > name-hash, we try a bunch of fruitless deltas before we find a decent > one. Nice. We learn new things every once in a while ;-) >> I hope that we are talking about the same name-hash, which is used >> to sort the blobs so that when pack-objects try to find a good delta >> base, the blobs from the same path will sit close to each other and >> hopefully fit in the pack window. > > Yes, exactly. We spend less time finding the good ones if the likely > candidates are close together. We may _also_ find better ones overall, > depending on the number of candidates and the window size. It is a pleasant surprise that in a real history like linux.git we can even find good delta base without the name hash (unless we are using insanely wide window size, that is). The objects in such a case will be sorted solely by size, larger to smaller, and we need to find a good delta base within that window. It may not be as horrible as fast-import (which only tries to delta against the previous single object), but with ~70k paths in a single revision with history that is ~1m deep, I was pessimistic to see a size-only sort to drop even a pair of blobs from the same path within the default window size of 10. But it seems that such a pessimism is unwarranted, which is a good news ;-).