Junio C Hamano wrote: > Jonathan Nieder <jrnieder@xxxxxxxxx> writes: >> Here's a few examples: >> >> 1. Suppose I track my $HOME directory as a git repository. Within my >> home directory, I have a src/git/ subdirectory with a clone of >> git.git, but I never intended to treat this as a submodule. >> >> If I run "git rev-parse --show-superproject-working-tree", then it >> will discover my home directory repository, run ls-files in there >> to see if it has GITLINK entries, and either see one for src/git if >> I had "git add"ed it by mistake or not see one. In either case, >> it would it would view my src/git/ directory as being a submodule >> of my home directory even though I hadn't intended it to be so. > > I am not sure about this one. If you added an unrelated one with > "git add" by mistake, you'd want to know about the mistake sooner > rather than later, no? My point with this example is that it's useful to have a definition of what is a submodule repository, to make it unambiguous whether this repository is a submodule or whether it's just a repository that happens to have been cloned inside of a git-managed worktree. For the specific example of having run "git add", I don't have any very strong opinions. [...] >> 2. Suppose I have a copy of a repository such as >> https://gerrit.googlesource.com/gerrit/, with all its submodules. >> I am in the plugins/replication/ directory. [...] >> So for example, if I had run "git rm --cached >> plugins/replication" to _prepare to_ remove the plugins/replication >> submodule, then "git rev-parse --show-superproject-working-tree" >> will produce the wrong result. > > Yes, looking only at the index of the superproject will have that > problem, but don't other things in the superproject point at the > submodule, too, e.g. submodule.<name>.* configuration variables? What all of those suggested alternatives have in common is that they are pointers from another repository to the submodule. This would be the first time in git history that we are saying a property of a repository depends on having to examine files outside of it. I guess the main question I'd have is, why _wouldn't_ I want a submodule to be able to point to the superproject containing it? I can think of many advantages to having that linkage, and the main disadvantage I can think of is that it is a change. I don't think that submodule.<name>.* is an adequate substitute for having this setting, because it requires - finding the superproject - mapping the <name> to a path, using .gitmodules - comparing the path to the submodule location which would be complex, slow, and error-prone. The one thing that I think could approach being an adequate substitute is examining the path to the current repository and stripping off path components until we find modules/; then the parent is the containing superproject. That would only work for absorbed submodules, though, and it would be less explicit than having a config item. > And then, after removing them to truly dissociate the submodule from > the superproject, "git rev-parse --show-superproject-working-tree" > may stop saying that it is a submodule, but this series wants to > make it irrelevant what the command says. Until you unset the > configuration variable in the submodule, it will stay to be a > submodule of the superproject, but the superproject no longer thinks > it is responsible for the submodule. You'll have to deal with an > inconsistent state during the transition either way, so I am not > sure it is the best solution to introduce an extra setting that can > easily go out of sync. This hints at a reason why one wouldn't want the linkage back --- dealing with the ambiguity of inconsistencies (what if a submodule declares a superproject but the superproject does not declare the submodule?). I would not expect that ambiguity to be much of a problem, because the typical way to use superproject linkage would be to print output from commands like "git status": for example, This is a submodule of ../../gerrit; you can run git -C ../../gerrit status to get the status of the superproject. An inconsistency could occur due to the user using "mv" (instead of "git mv") to move a submodule to a path a different number of path components from its superproject. One way to handle that would be to make submodules record a boolean setting reflecting whether they are a submodule, instead of the path to the superproject. (This would be similar to settings like core.bare.) Alternatively, if the path to the superproject is recorded and if "git fsck" is able to notice such an inconsistency, then the user should be able to have an okay experience repairing it. [...] >> If "git status" runs "git rev-parse >> --show-superproject-working-tree", then git would walk up the >> filesystem above my mawk/ directory, looking for another .git dir. >> We can reach an NFS automounter directory and just hang. Even >> without an NFS automounter, we'd expect this to take a while >> because, unlike normal repository discovery, we have no reason to >> believe that the walk is going to quickly discover a .git directory >> and terminate. So this would violate user expectations. > > It would be a problem, but I do not know if "this is a submodule of > that superproject" link is the only solution, let alone the most > effective one. It seems to me that you are looking more for > something like GIT_CEILING_DIRECTORIES. Who is the "you" addressed here? The end user can use GIT_CEILING_DIRECTORIES if they are expecting to run git commands within an NFS automounter directory and outside of any git repository, but they'd be right to be surprised if that suddenly became required when inside git repositories. I don't think we should assume that running an extra .git discovery walk is cost-free to users who are not using submodules and an acceptable burden to impose on them for the sake of submodule users. Thanks and hope that helps, Jonathan