Re: [PATCH 0/8] fetch --recurse-submodules: fetch unpopulated submodules

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 10 Feb 2022 09:40:20 -0800

Glen Choo <chooglen@xxxxxxxxxx> writes:

>> It is OK to allow fetching into submodule that is not currently have
>> a checkout, but I think we should view it purely as prefetching.  We
>> do not even know, after doing such a fetch in the submodule, we have
>> the commit necessary for the _next_ commit in superproject we will
>> check out.
>
> Hm, I may be misreading your message, but by "tip of random branch in
> the submodule", did you mean "tip of random branch in the
> _superproject_"?

No, I meant something like "git submodule foreach 'git fetch --all'"
(or without '--all' to fetch whatever the refspec there tells us),
i.e. tips of branches in the submodule.

>> The real question is not "in which submodules we fetch", but "what
>> commits we fetch in these submodules".  I do not think there is a
>> good answer to the latter.
>>
>> Of course, we we take this sequence instead:
>>
>> 	git checkout branch-with-submodules
>> 	git fetch --recurse-submodules
>> 	git checkout --recurse-submodules branch-with-submodules
>> 	
>> things should work correctly (I think we both are assuming that the
>> other side allows to fetch _any_ object, not just ref), as "fetch"
>> knows what superproject commit it is asked to complete, unlike the
>> previous example you gave, where it does not have a clue on what
>> superproject commit it is preparing submodules for, right?
>
> So, given my prior description of recursive fetch, we actually _do_ know
> which superproject commits to prepare for and which submodule commits to
> fetch.

Just to make sure I understand what is going on, let me rephrase.

 * To find out which submodule commits we need to fetch, we find new
   commits in the superproject we just fetched, inspect the trees of
   these commits to see gitlinks that name commits we need to fetch
   into the submodule repositories.

 * For that to work well, we need to know, from the path these
   commits appear in the trees of the superproject, to find out from
   which submodule to fetch these commits from.  And to make the
   mapping from paths to submodule names, we need to read
   .gitmodules from the same superproject commit we found the
   submodule commit in (as during the history of the superproject,
   the submodule may have moved around).

If so, I understand why being able to read .gitmodules from
superproject commits is essential.  The flow would become like

 (1) fetch in the superproject

 (2) iterate over each new superproject commit:
     - read its .gitmodules
     - iterate over each gitlink found in the superproject commit:
       - map the path we found gitlink at into module name
       - find the submodule repository initialized for the module
         - if the submodule is not of local interest, skip
         - add the submodule commit pointed by gitlink to the
           set of commits that need to be fetched for the submodule [*]

 (3) iterate over each submodule we found more than one commits that
     need to be fetched in, and fetch these commits (we do not have
     to go over the network to re-fetch commits that exist in the
     object store and are reachable from the refs, but "fetch"
     already knows how to optimize that).

Am I on the right track?

Thanks.