It's necessary for a superproject to know which submodules it contains. However, historically submodules do not know they are anything but a normal single-repo Git project (or a superproject, in nested-submodules cases). This decision does help prevent us from having to support divergent behaviors in submodule projects vs. superprojects, which makes sure Git is (somewhat) less confusing for the reader, and helps simplify our code. One could imagine, though, some conveniences we could gain from submodules learning added behavior (though not necessarily *different* behavior) to provide more context about the state of the project as a whole, and to make large submodule-based projects easier to work with. One example is a series[1] I sent some time ago, adding a config to be shared between the superproject and all its submodules. The RFC[2] I sent around the same time mentions a few other examples, such as "git status" run in the submodule noticing whether the superproject has a commit referencing the submodule's current HEAD. It's expensive and non-definitive to try and guess whether or not the current repo is a submodule. submodule.c:get_superproject_working_tree() does so by essentially running 'git -C .. ls-files -- <own-path>', invoking an additional process. get_superproject_working_tree() is not called often, so that's mostly fine. However, [1] attempted to include an additional config located in the superproject's gitdir by running 'git -C .. rev-parse --git-dir' during startup - a little expensive in the best case, because it's an extra process, but extremely expensive in the case when the current repo is *not* a submodule, because we hunt all the way up the filesystem looking for a '.git'. Adding that cost to every startup is infeasible. To that end, in this series I propose caching a path to the superproject's gitdir - by having the superproject write that relative path to the submodule's config on creation or update. The goal here is *not* to say "If I am a submodule, I must have submodule.superprojectGitDir set" - but instead to say "If I have submodule.superprojectGitDir set, then I must be a submodule." That is, I expect we will find edge cases where a submodule was introduced in some interesting way that bypassed any of the patches below, and therefore doesn't have the superproject's gitdir cached. The combination of these two rules: - Anything relying on submodule.superprojectGitDir must be nice to have, but not essential, because - It's possible for a submodule to be valid without having submodule.superprojectGitDir set makes me feel more comfortable with the idea of submodules learning additional behavior based on this config. I feel pretty unconfident in our ability to ensure that *every* submodule has this config set. The series covers a few paths for introducing that config, which I'm hoping covers most cases. - "git submodule update" (which seems to be part of the "git submodule init" flow) - "git submodule absorbgitdir" to convert a "git init"'d repo into a submodule Notably, we can only really set this config when 'the_repository' is the superproject - that appears to be the only time when we know the gitdirs of both the superproject and the submodule. I'm expecting folks may have a lot to say about this, so I look forward to discussion :) - Emily 1: https://lore.kernel.org/git/20210423001539.4059524-1-emilyshaffer@xxxxxxxxxx 2: https://lore.kernel.org/git/YHofmWcIAidkvJiD@xxxxxxxxxx Emily Shaffer (4): t7400-submodule-basic: modernize inspect() helper introduce submodule.superprojectGitDir cache submodule: cache superproject gitdir during absorbgitdirs submodule: cache superproject gitdir during 'update' builtin/submodule--helper.c | 4 +++ git-submodule.sh | 9 ++++++ submodule.c | 10 ++++++ t/t7400-submodule-basic.sh | 49 ++++++++++++++---------------- t/t7406-submodule-update.sh | 10 ++++++ t/t7412-submodule-absorbgitdirs.sh | 1 + 6 files changed, 57 insertions(+), 26 deletions(-) -- 2.32.0.272.g935e593368-goog