Re: [PATCH v4 2/3] implement fetching of moved submodules

Stefan Beller <sbeller@xxxxxxxxxx> · Wed, 18 Oct 2017 10:56:58 -0700

On Tue, Oct 17, 2017 at 5:03 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>>> +                       /* make sure name does not collide with existing one */
>>> +                       submodule = submodule_from_name(commit_oid, name);
>>> +                       if (submodule) {
>>> +                               warning("Submodule in commit %s at path: "
>>> +                                       "'%s' collides with a submodule named "
>>> +                                       "the same. Skipping it.",
>>> +                                       oid_to_hex(commit_oid), name);
>>> +                               name = NULL;
>>> +                       }
>>
>> This is the ugly part of using one string list and storing names or
>> path in it. I wonder if we could omit this warning if we had 2 string lists?
>
> We are keying off of 'name', because that is what will give a module
> its identity.  If we have a gitlink whose path is not in .gitmodules
> in the same tree, then we are seeing an unregistered submodule.

Right, so it has no submodule specific identity and we chose to "fake it"
by pretending its path is its name. However this requires checking as
there might be overlap in the name-namespace and the path-namespace.

>  If
> we were to "git add" it, then we'd use its path as the default name,

I presume "git submodule add"

> but if we already have a submodule with that name (the most likely
> explanation for its existence is because it started its life there
> and then later moved), and the submodule is bound to a different
> path, then that is a different submodule.  Skipping and warning both
> are sensible thing to do.

Skipping and warning is sensible once we decide to go this way.

I propose to take a step back and not throw away the information
whether the given string is a name or path, as then we do not have
to warn&skip, but we can treat both correctly.

As we only need to store an additional boolean (is it path or name?),
I had suggested to just use two lists, one for key-by-name and one
key-by-path, where we intend to use the key-by-name for submodules
and the by-path only for those with no name (i.e. lone gitlinks), hence
making this a "fallback list"

>
> I do not know what you see as ugly here,

the necessity of warn&skip instead of having a solution that
works in corner cases just fine.

> and more importantly, I am
> not sure how having two lists would help.

The current situation is that we use the path of the submodules only,
which makes it work without warn&skip, but it has other disadvantages
(i.e. new & moved submodules are not detected), which we want to fix.

We can add this functionality without caving in to skip the corner case
by storing an additional bit of information. The renaming is detected by
having a constant name before and after, just the path changed.
So we could continue to use by-path logic and only have the name
for rename detection. However that seems to be ugly, too. So we
seem to think that the by-name is better (as it is more in line with what
we think should happen, it is easier to explain, review and maintain(?)).

So we could have by-name keys, with the extra information of whether the
key is genuine or a "fake" key, which is t be resolved to a path instead.
And as that is just one bit, I proposed two lists for that.

Do I miss an essential part here?

Thanks,
Stefan