Re: [PATCH v2] index-pack: remove fetch_if_missing=0

Kousik Sanagavarapu <five231003@xxxxxxxxx> · Sun, 12 Mar 2023 22:46:13 +0530

On Sat, 11 Mar 2023 at 03:11, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> I admit I haven't thought about it any longer than anybody who
> touched this topic, but should "fetch_if_missing=0" really be
> treated as "it was a dirty hack in the past, now we do not need it,
> as all callers into the object layer avoids lazy fetching when they
> do not have to, so let's remove it"?  It looks to me more and more
> that the old world-view to disable lazy fetching by default and have
> individual calls to the object layer opt into fetching as needed may
> give us a better resulting code, or is it just me?

I think having a single function to check for object existence, which is
compatible with partial clones is better, because the end goal is to
completely integrate the concept of partial clones with git's codebase
and not have code that worries "Oh, there maybe bugs here because
what if the user has a partial clone", everytime the code does an object
existence check (that is, a call to has_object_file() or any of its
related functions) and just have fetch_if_missing set to 1 or 0, according
to the particular command.

It is also true that has_object() is not that "single function", because
in cases where we are missing an object in partial clone and want it,
has_object() has no way of fetching it. With or without flags (it only
supports one flag which, when set, rechecks packed storage), it does not
lazy-fetch in a partial clone. But there are cases where we need such
objects, such as the commands that come into "family (b)" [1].

So, why not use oid_object_info_extended() directly, instead of wrapping
it with some other function, whenever we are checking for an object's
existence. We can skip lazy-fetches whenever we want with
OBJECT_INFO_SKIP_FETCH_OBJECT and can also prefetch with
OBJECT_INFO_FOR_PREFETCH [2].

[1] https://lore.kernel.org/git/20230311025906.4170554-1-jonathantanmy@xxxxxxxxxx/

[2] pack-index itself is one example of where this is done.

    When we don't have REF_DELTA bases, we bulk prefetch them

	    if (has_promisor_remote()) {
                     /*
                      * Prefetch the delta bases.
                      */
                     struct oid_array to_fetch = OID_ARRAY_INIT;
		     for (i = 0; i < nr_ref_deltas; i++) {
                             struct ref_delta_entry *d = sorted_by_pos[i];
                             if (!oid_object_info_extended(the_repository, &d->oid,
                                                           NULL,
                                                           OBJECT_INFO_FOR_PREFETCH))
                                     continue;
                             oid_array_append(&to_fetch, &d->oid);
                     }
                     promisor_remote_get_direct(the_repository,
                                                to_fetch.oid, to_fetch.nr);
                     oid_array_clear(&to_fetch);
             }

    Instead of going object-by-object, which is basically like an
    infinite loop in large repos and partial clones are widely used
    in large repos.