Re: [PATCH] Re: Bug with "git submodule update" + subrepo with differing path/name?

Stefan Beller <sbeller@xxxxxxxxxx> · Thu, 21 Dec 2017 14:57:40 -0800

On Thu, Dec 21, 2017 at 2:08 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> That sounds like a bit of revisionist history, but you weren't around
> back then, so...
>
> https://public-inbox.org/git/11793556371774-git-send-email-junkio@xxxxxxx/#r
>
> is my summarization of discussions before that time. There is a
> mention of "three-level"
> thing by Steven Grimm in
> https://public-inbox.org/git/464E4C94.5070408@xxxxxxxxxxxxx/
> in the thread, and together with messages like
> https://public-inbox.org/git/7vejle6p96.fsf@xxxxxxxxxxxxxxxxxxxxxxxx/
> you can see that we already knew that submodule identity (name), path
> in the superproject (path) and where it
> comes from (url) need to be separate things that need to be tied
> together by .gitmodules.

Taking this private reply back to the list, I hope you don't mind.
I think linking into the archives (hence making the huge unstructured
archive a bit more discoverable by these links) is a good idea.

Yes, I was not around when these discussions happened, hence I came
up with a narrative that I can rationalize best.

This message https://public-inbox.org/git/464CF435.1010405@xxxxxxxxxxxxx/
helps (me) most as that states the problem that need to be solved and I agree
with the issues and how to solve it. The "symbolic names", however, are the
crux there. They must not be changed, so as Andreas says, maybe "ID" is
a better notion than "submodule name".

I am of the opinion that we'd even want to go as far as to not expose
this symbolic to the user.

So if we redesign from scratch, I would not have "gitlink entries"
but rather "special blobs"[1], that contain the submodule sha1,
as well as their ID.

A special blob could look like:

  git cat-file "kernel/"
  version <sha1>
  id <another hash value, determined at creation time>

Additionally you'd have slight guidance of id -> URL in
an extra ref, versioned independently of the main history:

  <other hash> -> git://kernel.org/...

(A)
So if linux moves away from kernel.org, you can change that ref
independently of your history at that time, such that even old
versions of your main history can obtain the kernel from the correct
URL. Of course a porcelain command could do the double lookup
such that the user can say "kernel/ -> newURL" without needing
to have knowledge of the internal ID.

(B)
Now if you want to move the submodule locally, you can do so by
having that special blob at a different path in your tree.
git-mv "just works" even for nested submodules. No need to
rewrite the .gitmodules file.

(C)
What if you want to drop linux and use some BSD?
Then you change the ID such that a new submodule repo
is created inside .git/modules/. Of course it is also pinned
to a different sha1. A porcelain command such as
"git submodule replace" would sound like a good porcelain-ish.

I think this so far is a clean design, but it is incompatible with history.

The really big difference is that the ID is the core thing that everything
revolves around (similar to the submodule name), but is tied to the path
and is hidden as much as possible from the user, who can use given
commands for each of the scenarios.

The big flaw of this that the ID is sort of random and not based
on content, which is counter to Gits philosophy as a content
addressable FS. Maybe the ID can be set to
HASH(<first version> +<path>), such that when two people
independently of each other make a submodule pointing at the
same commit at the same path, they'd have the same tree-id.

Thanks,
Stefan

[1] An in office discussion hinted at that "special blobs" are not
that special. git-LFS uses this mechanism, but of course their
design flaw is to keep it out of main-Git. But one could posit
"submodules can be implemented using smudge filters alone".