On Wed, Feb 03, 2021 at 10:59:13PM -0500, Taylor Blau wrote: > From: Jeff King <peff@xxxxxxxx> > > This is the same as the regular repack test, except that we mark the > single base pack as "kept" and use --assume-kept-packs-closed. The I don't think that option exists anymore. I guess we are just using --stdin-packs, which causes us to mark a pack as kept. I think we could just mark it in the filesystem and use --honor-pack-keep, which would make it independent of your new feature. At first I was going to say "but it doesn't matter either way", but... > theory is that this should be faster than the normal repack, because > we'll have fewer objects to traverse and process. > > Here are some timings on a recent clone of the kernel. In the > single-pack case, there is nothing do since there are no non-excluded > packs: > > 5303.5: repack (1) 57.42(54.88+10.64) > 5303.6: repack with --stdin-packs (1) 0.01(0.01+0.00) > > and in the 50-pack case, it is much faster to use `--stdin-packs`, since > we avoid having to consider any objects in the excluded pack: > > 5303.10: repack (50) 71.26(88.24+4.96) > 5303.11: repack with --stdin-packs (50) 3.49(11.82+0.28) > > but our improvements vanish as we approach 1000 packs. > > 5303.15: repack (1000) 215.64(491.33+14.80) > 5303.16: repack with --stdin-packs (1000) 198.79(380.51+7.97) > > That's because the code paths around handling .keep files are known to > scale badly; they look in every single pack file to find each object. Well, part of it is just that with 1000 packs we have 20 times as many objects that are actually getting packed with --stdin-packs, compared to the 50-pack case. IIRC, each pack is a fixed-size slice and then the residual is put into the .keep pack. So the fact that the time gets closer to a full repack as we add more packs is expected: we are asking pack-objects to do more work! For showing the impact of the optimizations in patches 7 and 8, I think doing a full repack with --honor-pack-keep is a better test. Because then we're always doing a full traversal, and most of the work continues to scale with the repo size (though obviously not the actual shuffling of packed bytes around). That would get rid of the weird "no work to do" case in the single-pack tests, too. -Peff