Re: [PATCH] index-pack: remove fetch_if_missing=0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 28 Feb 2023 at 03:44, Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:
>
> Kousik Sanagavarapu <five231003@xxxxxxxxx> writes:
> > A collision test is triggered in sha1_object(), whenever there is an
> > object file in our repo. If our repo is a partial clone, then checking
> > for this file existence has the behavior of lazy-fetching the object
> > because we have one or more promisor remotes.
>
> Hmm...this is not true, because (as you said)...
>
> > This behavior is controlled by setting fetch_if_missing to 0,
>
> ...this makes it so that we don't fetch in this situation.

Yes, that statement is false if fetch_if_missing is set to 0. But my original
thought in writing it was so that the anyone who is reading the commit message
understands the motivation as to why we are setting fetch_if_missing to 0.

> [...]
>
> > @@ -1728,14 +1727,6 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
> >       int report_end_of_input = 0;
> >       int hash_algo = 0;
> >
> > -     /*
> > -      * index-pack never needs to fetch missing objects except when
> > -      * REF_DELTA bases are missing (which are explicitly handled). It only
> > -      * accesses the repo to do hash collision checks and to check which
> > -      * REF_DELTA bases need to be fetched.
> > -      */
> > -     fetch_if_missing = 0;
>
> I think that the author of such a commit (you) should also independently
> verify that this comment is true (and if it is, then yes, all the
> remaining cases are handled and we can remove this assignment to
> fetch_if_missing). I believe this comment to be true, but I haven't
> checked the code in a while so I'm not sure myself.

It seems indeed that this is the only place where lazy-fetching is possible.
I checked this by looking up the calls for oid_object_info_extended() or
any other function in object-file.c which depends on it.

In builtin/index-pack.c, we have (in the order that these functions appear)

- check_object()
    Call to oid_object_info(), but we return early with
    0 if we don't have an object.

- sha1_object()
    Call to has_object_file_with_flags() (which this patch replaces with
    has_object()), where lazy-fetching is possible.
    
    Calls to oid_object_info() and read_object_file(), which trigger only
    when the above has_object_file_with_flags() succeeds.

- fix_unresolved_deltas()
    Call to oid_object_info_extended(), we prefetch delta bases.

    Call to read_object_file(), but we only read data from ref_delta_entry.
    In case it was a delta base, we already prefetched it.

There are cases where we fsck objects, but lazy-fetching is already handled
in fsck (although by setting fetch_if_missing to 0).

Do we need to be explicit about this in the commit message? That sha1_object()
is the only place where there is a chance to lazy-fetch if it is a partial clone?

> > +test_expect_success 'index-pack does not lazy-fetch when checking for sha1 collisions' '
> > +     rm -rf server promisor-remote client &&
> > +     rm -rf object-count &&
> > +
> > +     git init server &&
> > +     for i in 1 2 3 4
> > +     do
> > +             echo $i >$(pwd)/server/file$i &&
> > +             git -C server add file$i &&
> > +             git -C server commit -am "Commit $i" || return 1
> > +     done &&
> > +     git -C server config --local uploadpack.allowFilter 1 &&
> > +     git -C server config --local uploadpack.allowAnySha1InWant 1 &&
> > +     HASH=$(git -C server hash-object file3) &&
> > +
> > +     git init promisor-remote &&
> > +     git -C promisor-remote fetch --keep "file://$(pwd)/server" $HASH &&
> > +
> > +     git clone --no-checkout --filter=blob:none "file://$(pwd)/server" client &&
> > +     git -C client remote set-url origin "file://$(pwd)/promisor-remote" &&
> > +     git -C client config extensions.partialClone 1 &&
> > +     git -C client config remote.origin.promisor 1 &&
> > +
> > +     # make sure that index-pack is run from within the repository
> > +     git -C client index-pack $(pwd)/client/.git/objects/pack/*.pack &&
> > +     test_path_is_missing $(pwd)/client/file3
> > +'
>
> How does this check that no lazy fetch has occurred? It seems to me
> that you're just checking the existence of a file in the worktree,
> which does not indicate the presence or absence of a lazy fetch.

What I had in mind was if the file was lazy-fetched (because of the failure
of has_object_file_with_flags() and fetch_if_missing not set to 0), then
it would be unpacked and we would find it in the worktree. Since, we
prevent this exact behavior by using has_object(), we should not find
such a file in our repo.

> I think the way to test needs to be more complicated: you need
> to create a partial clone, fetch into it from another repo, and
> then verify that no fetches were made to the original partial
> clone.

So, after the fetch, during the pack indexing phase, we look for
any additional fetches made. This makes more sense and it would
be way more clear, to anyone reading, than what I wrote.

Will do a reroll. If there needs to be a change in the commit message
as well, please let me know.

Thanks for the review



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux