Re: [PATCH v2 7/9] fetch: fetch unpopulated, changed submodules

Glen Choo <chooglen@xxxxxxxxxx> · Thu, 17 Feb 2022 01:33:25 +0800

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:

> On Wed, Feb 16 2022, Glen Choo wrote:
>
>> Glen Choo <chooglen@xxxxxxxxxx> writes:
>>
>>> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:
>>>
>>>> Glen Choo <chooglen@xxxxxxxxxx> writes:
>>>>> +	# Create new superproject commit with updated submodules
>>>>> +	add_upstream_commit &&
>>>>> +	(
>>>>> +		cd submodule &&
>>>>> +		(
>>>>> +			cd subdir/deepsubmodule &&
>>>>> +			git fetch &&
>>>>> +			git checkout -q FETCH_HEAD
>>>>> +		) &&
>>>>> +		git add subdir/deepsubmodule &&
>>>>> +		git commit -m "new deep submodule"
>>>>> +	) &&
>>>>> +	git add submodule &&
>>>>> +	git commit -m "new submodule" &&
>>>>
>>>> I thought add_upstream_commit() would do this, but apparently it just
>>>> adds commits to the submodules (which works for the earlier tests that
>>>> just tested that the submodules were fetched, but not for this one). I
>>>> think that all this should go into its own function.
>>
>> I'm testing out a function that does exactly what these lines do, i.e.
>> create a superproject commit containing a submodule change containing a
>> deepsubmodule change. That works pretty well and it makes sense in the
>> context of the tests.
>>
>>>> Also, I understand that "git fetch" is there to pick up the commit we
>>>> made in add_upstream_commit, but this indirection is unnecessary in a
>>>> test, I think. Can we not use add_upstream_commit and just create our
>>>> own in subdir/deepsubmodule? (This might conflict with subsequent tests
>>>> that use the old scheme, but I think that it should be fine.)
>>
>> We can avoid the "git fetch" if we first need to fix an inconsistency in
>> how the submodules are set up. Right now, we have:
>>
>>   test_expect_success setup '
>>     mkdir deepsubmodule &&
>>     [...]
>>     mkdir submodule &&
>>     (
>>     [...]
>>       git submodule add "$pwd/deepsubmodule" subdir/deepsubmodule &&
>>       git commit -a -m new &&
>>       git branch -M sub
>>     ) &&
>>     git submodule add "$pwd/submodule" submodule &&
>>     [...]
>>     (
>>       cd downstream &&
>>       git submodule update --init --recursive
>>     )
>>   '
>>
>> resulting in a directory structure like:
>>
>> $pwd
>> |_submodule
>>   |_subdir
>>     |_deepsubmodule
>> |_deepsubmodule
>>
>> and upstream/downstream dependencies like:
>>
>> upstream                             downstream            
>> --------                             ----------
>> $pwd/deepsubmodule                   $pwd/downstream/submodule/subdir/deepsubmodule (SUT)
>>                                      $pwd/submodule/subdir/deepsubmodule
>>
>>
>> So we can't create the commit in submodule/subdir/deepsubmodule, because
>> that's not where our SUT would fetch from.
>>
>> Instead, we could fix this by having a more consistent
>> upstream/downstream structure:
>>
>> $pwd
>> |_submodule
>>   |_subdir
>>     |_deepsubmodule
>>
>> upstream                             downstream            
>> --------                             ----------
>> $pwd/submodule/subdir/deepsubmodule  $pwd/downstream/submodule/subdir/deepsubmodule (SUT)
>>
>> This seems more convenient to test, but before I commit to this, is
>> there a downside to this that I'm not seeing?
>
> Won't this sort of arrangement create N copies of e.g. a zlib.git or
> some other common library that might be used by N submodules.
>
> But I haven't read all the context, I'm assuming you're talking about
> how we store in-tree a/b and x/y/b submodules now, we store those in
> .git/ both as .git/modules/b.git or whatever? I can't remember ... :)

Ah the problem I'm describing is much simpler, it's just "how do we want
our test setup (which has submodules) to look".

But we can also consider the question you are asking :)

> Whatever we do now I do think the caveat I've noted above is interesting
> when it comes to submodule design, e.g. if both git.git and
> some-random-thing.git both bring in the same sha1collisiondetection.git
> from the same github URL should those be the same in our underlying
> storage?
>
> I think the answer to that would ideally be both "yes" and
> "no".
>
> I.e. "yes" because it's surely handy for "git fetch", now you don't need to
> fetch the same stuff twice in the common case of just updating all our
> recursive submodules.

Hm, and it would save space on disk.

> And also "no" because maybe some users would really consider them
> different. E.g. the you may want to "cd git/" and adjust the git.git one
> and create a branch there for some hotfix it needs, which would not be
> needed/wanted by some-random-thing.git.

I don't think we could say "yes" for all users, because the subset of
users you describe here will probably appreciate them being separate.

But I can imagine doing this manually, like a "git submodule dedupe",
that lets users who really need it can opt into this risky setup where
submodules are shared. Does anyone really need it though? I'm not sure
yet.