Jeff King <peff@xxxxxxxx> writes: > On Mon, Aug 19, 2024 at 11:07:51AM +0200, Patrick Steinhardt wrote: > >> It mostly boils down to git-repack(1) doing a connectivity check, >> whereas git-pack-objects(1) doesn't. We just soak up every single loose >> object, and then eventually we expire them via git-multi-pack-index(1)'s >> "expire" subcommand. > > Hmph. I'd have suggested that we should teach git-repack to do the more > efficient thing. I'm a bit worried about having parallel universes of > how maintenance works making it harder to reason about when or how > things happen, and how various concurrent / racy behaviors work. I'd suggest being careful before going there. The above only explains why it is OK not to exclude unreachable cruft, but does not address another thing we should be worried about, which is the quality of the resulting pack. Throwing a random set of object names at pack-objects in the order that they are discovered by for_each_loose_file_in_objdir(), which is what gc.c:pack_loose() does, would give no locality benefit that walking the commits would. If we assume that we will pack_loose() often enough that we won't have huge number of objects in the resulting pack, packing objects that are close in the history may not matter much, but on the other hand, if we run pack_loose() too often to produce a small pack, you would not have a great delta base selection. So we should probably monitor how much "badness" the pack_loose() is causing, and if it turns out to be too much, we may need to reconsider its design. Being able to produce ultra-quickly a pack whose layout and delta base choice would hurt runtime performance is not a feature. > But it's probably a bit late to re-open that (and certainly it's not > part of your series). True. Thanks.