Re: [PATCH 2/3] index-pack: no blobs during outgoing link check

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 04 Dec 2024 07:16:24 +0900

Patrick Steinhardt <ps@xxxxxx> writes:

>> The benefit of doing this is as above (fetch speedup), but the drawback
>> is that if the packfile to be indexed references a local blob directly
>> (that is, not through a local tree), that local blob is in danger of
>> being garbage collected. Such a situation may arise if we push local
>> commits, including one with a change to a blob in the root tree,
>> and then the server incorporates them into its main branch through a
>> "rebase" or "squash" merge strategy, and then we fetch the new main
>> branch from the server.
>
> Okay, so we know that we are basically doing the wrong thing with the
> optimization, but by skipping blobs we can get a significant speedup and
> the failure mode is that we will re-fetch the object in a later step.
> And because we think the situation is rare it shouldn't be a huge issue
> in practice.

That is how I read it, but the description may want to make the pros
and cons more explicit.  One of the reasons why the users choose to
use lazy clone is to avoid grabbing large blobs until they need
them, so I am not sure how they feel about having to discard and
then fetch again after creating a potentially large blob.  As long
as it does not happen repeatedly for the same blob, it probably is
OK.

> Without the context of the commit message this code snippet likely would
> not make any sense to a reader. The "correct" logic would be to record
> all objects, regardless of whether they are an object ID or not. But we
> explicitly choose not to as a tradeoff between performance and
> correctness.
>
> All to say that we should have a comment here that explains what is
> going on.

Thanks for reviewing and commenting.