Re: Future structure of submodules and .git/, .git/modules/* organization

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 15 Apr 2021 14:41:11 -0700

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:

> I very much wish that we could eventually make the use of submodules
> totally transparent, i.e. (taking the example of git.git):
>
>  * You clone, and we just get objects from
>    https://github.com/cr-marcstevens/sha1collisiondetection.git too
>
>  * The fact that we have:
>
>    160000 commit 855827c583bc30645ba427885caa40c5b81764d2  sha1collisiondetection
> ...

I am afraid that the story is not that simple (I wish it were).

There are at last two opposing ways submodules are to be used.  The
original motivation was to borrow an external project as part of
your project, and the way we use SHA1DC is fairly close to it (but
not quite).  In the context of such a usage

	git commit -m "message" --recurse-submodules

would often not be an appropriate operation.  A message that is
suitable in the context of the entire project would not be in the
context of the project that is bound to your project as a submodule,
and for your changes to be reusable by the folks who own the borrowed
project to make sense, your change should be defensible on its own,
"it helps this project that happens to use you as a submodule" by
itself is not all that convincing.

The other way submodule often gets used is to artificially split a
logically single project into many subdirectories and make them into
separate repositories, the top-level project binding them as
submodules.  An submodule in such an arrangement may not even make
sense as a standalone project---this pattern was only brought into
usage because without the more recent inventions like lazy/partial
clones and sparse checkouts, large projects did not fit within a
single repository.

With such an arrangement, of course it makes perfect sense for
things like

	git commit -m "message" --recurse-submodules
	git grep --recurse-submodules

to work as if you are working inside a single repository, by
definition.  You are splitting a logically single project into
multiple submodules as a workaround, and then still wanting to
treat them as a single project, after all.

Supporting those who want to use "collection of submodules as if it
were a single monolithic project" well is a worthy goal, but I do
not think it is healthy to assume that is the only use and forget
about use cases that would benefit from a clear boundary at
submodules (e.g. not sharing commit log message, a change at the
toplevel project may consist of multiple changes at the submoudle
level, etc.).

Thanks.