Re: [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules

Glen Choo <chooglen@xxxxxxxxxx> · Thu, 10 Feb 2022 16:51:45 +0800

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Glen Choo <chooglen@xxxxxxxxxx> writes:
>
>> = Background
>>
>> When fetching submodule commits, "git fetch --recurse-submodules" only
>> considers populated submodules, and not all of the submodules in
>> $GIT_DIR/modules as one might expect. As a result, "git fetch
>> --recurse-submodules" behaves differently based on which commit is
>> checked out.
>
> After getting 'init'ed, which is a sign that the user is interested
> in that submodule, when we happen to check out a superproject commit
> that lack the submodule in question, do we _lose_ the record that it
> was once of interest?  I do not think so.  The cloned copy in
> $GIT_DIR/modules/ would not go away, so we _could_ update it, even
> there is no checkout, when the superproject we currently have may
> not have the submodule.
>
> But I am not sure if that is a problem.  After all, the recursive
> fetch tries to make sure that the superproject commit that is
> checked out is reproduced as recorded by fetching the submodule
> commit recorded in the superproject commit, not a commit that
> happens to be at the tip of random branch in the submodule.
>
> It is OK to allow fetching into submodule that is not currently have
> a checkout, but I think we should view it purely as prefetching.  We
> do not even know, after doing such a fetch in the submodule, we have
> the commit necessary for the _next_ commit in superproject we will
> check out.

Hm, I may be misreading your message, but by "tip of random branch in
the submodule", did you mean "tip of random branch in the
_superproject_"?

If so, prior to this series, recursive fetch already fetches submodule
commits that are recorded by superproject commits other than the one
checked out. submodule.c:calculate_changed_submodule_paths() performs a
rev walk starting from the newly fetched superproject branch tips to
find missing submodule commits that are referenced by superproject
commits. These missing submodule commits are explicitly fetched by the
recursive fetch.

So we already do prefetching, but this series makes the prefetching
smarter by also prefetching in submodules that aren't checked out.

(I think my cover letter could have been clearer; I should have
explicitly called out the fact that we already do prefetching.)

>> This can be a problem, for instance, if the user has a branch with
>> submodules and a branch without:
>>
>>   # the submodules were initialized at some point in history..
>>   git checkout -b branch-with-submodules origin/branch-with-submodules
>>   git submodule update --init
>>
>>   # later down the road..
>>   git checkout --recurse-submodules branch-without-submodules
>>   # no submodules are fetched!
>>   git fetch --recurse-submodules
>>   # if origin/branch-with-submodules has new submodule commits, this
>>   # checkout will fail because we never fetched the submodule
>>   git checkout --recurse-submodules branch-with-submodules
>
> That is expected, and UNLESS we fetched _everything_ offered by the
> remote repository to the submodule in the previous step, there is no
> guarantee that this "recurse-submodules" checkout would succeed.

Yes. With the current prefetching, I don't think we make any guarantee
to the user that all submodule commits will be fetched (even if all of
the subomdules are checked out).

But if I understand the "find changed submodules" rev walk correctly, we
look for changed submodules in the ancestry chains of the branch tips
(but I'm not sure how the rev walk decides to stop). So we might be
_very close_ to fetching all the commits that we think users care about
even though we don't guarantee that all commits will be fetched.

>> This series makes "git fetch" fetch the right submodules regardless of
>> which commit is checked out, as long as the submodule has already been
>> cloned. In particular, "git fetch" learns to:
>>
>> 1. read submodules from the relevant superproject commit instead of
>>    the file system
>> 2. fetch all changed submodules, even if they are not populated
>
> The real question is not "in which submodules we fetch", but "what
> commits we fetch in these submodules".  I do not think there is a
> good answer to the latter.
>
> Of course, we we take this sequence instead:
>
> 	git checkout branch-with-submodules
> 	git fetch --recurse-submodules
> 	git checkout --recurse-submodules branch-with-submodules
> 	
> things should work correctly (I think we both are assuming that the
> other side allows to fetch _any_ object, not just ref), as "fetch"
> knows what superproject commit it is asked to complete, unlike the
> previous example you gave, where it does not have a clue on what
> superproject commit it is preparing submodules for, right?

So, given my prior description of recursive fetch, we actually _do_ know
which superproject commits to prepare for and which submodule commits to
fetch.

> Also, if the strategy is to prefetch in all submodules that were
> 'init'ed, we do not have to read .gitmodules from the superproject
> commit at all, right?  We can just go check .git/modules/ which is
> available locally.  We need to see which submodules are of local
> interest by consulting .git/config and/or .git/modules/ anyway even
> if we read .gitmodules from the superproject commit to learn what
> modules are there.

Hm, good point. Finding submodules of interest in .git/modules or
.git/config sounds like common sense (it's more obvious than trying to
identify all submodules by doing a rev walk at least). 

That said, just looking at what submodules we have doesn't tell us which
submodule commits we need, which is why we have the "find changed
submodules" rev walk. And since we already have the rev walk (which
tells us which superproject commits we care about), it's not that much
effort to fetch non-checked-out submodules.

So I think we'd eventually want to consult .git/modules and .git/config
(we'll have to do this when we start teaching "git fetch" to clone new
submodules, for example) but it's unnecessary for this series.