On Tue, Feb 16, 2021 at 06:58:16PM -0500, Jeff King wrote: > On Wed, Feb 03, 2021 at 10:59:13PM -0500, Taylor Blau wrote: > > > From: Jeff King <peff@xxxxxxxx> > > > > This is the same as the regular repack test, except that we mark the > > single base pack as "kept" and use --assume-kept-packs-closed. The > > I don't think that option exists anymore. I guess we are just using > --stdin-packs, which causes us to mark a pack as kept. > > I think we could just mark it in the filesystem and use > --honor-pack-keep, which would make it independent of your new feature. > At first I was going to say "but it doesn't matter either way", but... > > > theory is that this should be faster than the normal repack, because > > we'll have fewer objects to traverse and process. > > > > Here are some timings on a recent clone of the kernel. In the > > single-pack case, there is nothing do since there are no non-excluded > > packs: > > > > 5303.5: repack (1) 57.42(54.88+10.64) > > 5303.6: repack with --stdin-packs (1) 0.01(0.01+0.00) > > > > and in the 50-pack case, it is much faster to use `--stdin-packs`, since > > we avoid having to consider any objects in the excluded pack: > > > > 5303.10: repack (50) 71.26(88.24+4.96) > > 5303.11: repack with --stdin-packs (50) 3.49(11.82+0.28) > > > > but our improvements vanish as we approach 1000 packs. > > > > 5303.15: repack (1000) 215.64(491.33+14.80) > > 5303.16: repack with --stdin-packs (1000) 198.79(380.51+7.97) > > > > That's because the code paths around handling .keep files are known to > > scale badly; they look in every single pack file to find each object. > > Well, part of it is just that with 1000 packs we have 20 times as many > objects that are actually getting packed with --stdin-packs, compared to > the 50-pack case. IIRC, each pack is a fixed-size slice and then the > residual is put into the .keep pack. So the fact that the time gets > closer to a full repack as we add more packs is expected: we are asking > pack-objects to do more work! No, the residual base pack isn't marked as kept on-disk. But the --stdin-packs test treats it as such, by passing '^pack-$base_pack.pack' as input to '--stdin-packs' (thus marking it as kept in-core). > For showing the impact of the optimizations in patches 7 and 8, I think > doing a full repack with --honor-pack-keep is a better test. Because > then we're always doing a full traversal, and most of the work continues > to scale with the repo size (though obviously not the actual shuffling > of packed bytes around). That would get rid of the weird "no work to do" > case in the single-pack tests, too. I think you're suggesting that we change the "repack ($nr_packs)" test to have the residual pack marked as kept (so we're measuring time it takes to repack everything that _isn't_ in the base pack)? That would allow a more direct comparison, but I think it's loosing out on an important aspect which is how long it takes to pack the entire repository. Maybe we want three. What do you think? > -Peff Thanks, Taylor