Re: [PATCH v4 09/10] fetch: fetch unpopulated, changed submodules

Glen Choo <chooglen@xxxxxxxxxx> · Fri, 04 Mar 2022 16:37:51 -0800

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Glen Choo <chooglen@xxxxxxxxxx> writes:
>
>> And now that you've pointed this out, I realize that we could do away
>> with (.super_oid, .path) altogether if we had a variant of
>> repo_submodule_init() that takes the submodule name instead of (path,
>> treeish_name). (We have a similar submodule_from_name(), but that only
>> reads the submodule config, not a struct repository.) I would prefer not
>> to introduce such a function so late into the review cycle, but I could
>> clean this up later.
>
> I am puzzled.  What do you exactly mean by "late into the review
> cycle"?

I mean that reviewers have already seen several iterations of this, and
I'm afraid that a refactor might introduce unnecessary cognitive
overhead.

But of course, we might decide that the refactor is a good enough idea
that we want to do it anyway :)

>> - The changed_submodules string_list is basically a map that tells us,
>>   for a given submodule _name_, which commits we need to fetch and where
>>   repo_submodule_init() can read the submodule name from.
>> - We only use cs_data as a string_list_item.util, and the
>>   string_list_item.string is the submodule name itself.
>> - .new_commits tells us which commits to fetch.
>> - .super_oid and .path tells repo_submodule_init() how to get the name
>>   of the submodule.
>>
>> So we don't need to make this a 3-tuple.
>
> OK.  We need to learn in which local repository houses the submodule
> we discover in cs_data resides.  It may or may not have a checkout
> in the current checkout of the superorject commit.  And just one
> <.super_oid, .path> tuple should be sufficient to tell us that,
> because the mapping from submodule name to path may change as "git
> mv" moves it around, but the mapping from submodule name to where
> the submodule repository is stored in the .git/ directory of the
> superproject should not change.  Am I following you so far
> correctly?

Yes, that's correct.

> I am wondering if we need even one <.super_oid, .path> tuple.
> Looking at the implementation of repo_submodule_init(), I have a
> feeling that a version of "initialize named submodule in a given
> tree-ish in the superproject" would be rather trivial.  We already
> have submodule name, so submodule_name_to_gitdir() would be all we
> need, no?  After all, we are only interested in fetching objects to
> fill missing commits (and possibly update the remote tracking
> branches) and do not care about touching its working tree.  And once
> we learn that .git/modules/<name>/ directory, we can fetch the
> necessary commits into it, right?
>
> Or am I oversimplifying the problem?

I don't think you are oversimplifying. Now that I look at it again, it
really _does_ seem trivial. Doing this refactor saves me the headache of
explaining why we need a single <.super_oid, .path> tuple, and saves
readers the headache of figuring out if I'm right.

I'll try it and see if it really makes things simpler or not.