Re: [PATCH 3/3] builtin/maintenance: fix loose objects task emitting pack hash

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 19 Aug 2024 10:05:07 -0700

Jeff King <peff@xxxxxxxx> writes:

> On Mon, Aug 19, 2024 at 11:07:51AM +0200, Patrick Steinhardt wrote:
>
>> It mostly boils down to git-repack(1) doing a connectivity check,
>> whereas git-pack-objects(1) doesn't. We just soak up every single loose
>> object, and then eventually we expire them via git-multi-pack-index(1)'s
>> "expire" subcommand.
>
> Hmph. I'd have suggested that we should teach git-repack to do the more
> efficient thing. I'm a bit worried about having parallel universes of
> how maintenance works making it harder to reason about when or how
> things happen, and how various concurrent / racy behaviors work.

I'd suggest being careful before going there.

The above only explains why it is OK not to exclude unreachable
cruft, but does not address another thing we should be worried
about, which is the quality of the resulting pack.

Throwing a random set of object names at pack-objects in the order
that they are discovered by for_each_loose_file_in_objdir(), which
is what gc.c:pack_loose() does, would give no locality benefit that
walking the commits would.  If we assume that we will pack_loose()
often enough that we won't have huge number of objects in the
resulting pack, packing objects that are close in the history may
not matter much, but on the other hand, if we run pack_loose() too
often to produce a small pack, you would not have a great delta base
selection.

So we should probably monitor how much "badness" the pack_loose()
is causing, and if it turns out to be too much, we may need to
reconsider its design.  Being able to produce ultra-quickly a pack
whose layout and delta base choice would hurt runtime performance is
not a feature.

> But it's probably a bit late to re-open that (and certainly it's not
> part of your series).

True.

Thanks.