On Wed, Feb 17, 2021 at 02:13:47PM -0500, Taylor Blau wrote: > > > That's because the code paths around handling .keep files are known to > > > scale badly; they look in every single pack file to find each object. > > > > Well, part of it is just that with 1000 packs we have 20 times as many > > objects that are actually getting packed with --stdin-packs, compared to > > the 50-pack case. IIRC, each pack is a fixed-size slice and then the > > residual is put into the .keep pack. So the fact that the time gets > > closer to a full repack as we add more packs is expected: we are asking > > pack-objects to do more work! > > No, the residual base pack isn't marked as kept on-disk. But the > --stdin-packs test treats it as such, by passing '^pack-$base_pack.pack' > as input to '--stdin-packs' (thus marking it as kept in-core). Sorry, I perhaps shouldn't have said ".keep" here. But it's the same thing, isn't it? The 50 pack case is packing 50*pack_size objects (because it's excluding everything else that is in the base pack we mark as keep-in-core), and the 1000-pack case is packing 1000*pack_size objects (for the same reason). So any patterns we see between them have more to do with that, than how the keep-handling code scales with the number of non-kept packs. > > For showing the impact of the optimizations in patches 7 and 8, I think > > doing a full repack with --honor-pack-keep is a better test. Because > > then we're always doing a full traversal, and most of the work continues > > to scale with the repo size (though obviously not the actual shuffling > > of packed bytes around). That would get rid of the weird "no work to do" > > case in the single-pack tests, too. > > I think you're suggesting that we change the "repack ($nr_packs)" test > to have the residual pack marked as kept (so we're measuring time it > takes to repack everything that _isn't_ in the base pack)? > > That would allow a more direct comparison, but I think it's loosing out > on an important aspect which is how long it takes to pack the entire > repository. Maybe we want three. That was what I was suggesting, but I think it's equivalent to what your --stdin-packs is testing. I guess the most interesting thing would actually be an _additional_ pack mark as .keep (and that pack does not even have to contain anything interesting -- the point is how much effort it costs to find that out. Of course the bigger it is the more pronounced the effect of avoiding lookups in it). -Peff