On Sun, Aug 29, 2021 at 10:48:43PM -0400, Taylor Blau wrote: > This short series is extracted from mine and Peff's work on cruft packs. These > three patches focus on cleaning up add_objects_in_unpacked_packs(), which is > used to implement `git repack -k`. > > The pay-off for this clean-up is significant, though: we net -50 lines of code, > and the result is much more readable, at least in my opinion. > > The changes are described in detail in the patch messages, but essentially we > are replacing a loop over get_all_packs() with for_each_packed_object() after > adding a couple of new flags necessary to make the switch. And once we are done > with that, the third patch removes a bit from the object flag allocation table. Thanks for extracting these from that other work. I gave them all an extra read-through and didn't find anything wrong. > builtin/pack-objects.c | 85 ++++++------------------------------------ > object-store.h | 6 +++ > object.h | 1 - > packfile.c | 6 +++ > 4 files changed, 24 insertions(+), 74 deletions(-) Of course I love this diffstat, and I _really_ love dropping the one-off bit from the flag allocation table. But I wanted to also give some numbers for the lookup_unknown_object() part of the final patch. Those unknown objects cost 72 bytes of heap each (the same size as a commit, since it's the biggest struct and the unknown object is a union). We've seen some real-world cases where there are 40M+ unreachable objects. So that's almost 3GB of wasted RAM during pack-objects just to store those "did I see it already" bits. :) Of course, we are already spending 96 bytes per object in "struct object_entry", but at least that's doing more useful stuff. :) (And the more obvious question is: why not delete those objects. The answer thus far has been: because git's pruning is racy, and thus we don't run it as part of our automatic maintenance). -Peff