From: Glen Choo <chooglen@xxxxxxxxxx> Demonstrate, using tests, an inconsistency in how Git treats repeated configuration keys in .gitmodules depending on whether we read it from the working tree or from an object. Do not attempt to fix it yet, because we don't know how much test coverage we have here, and what the 'right' fix is. When a .gitmodules file contains multiple configurations for a submodule like so: [submodule "sub"] path = path1 path = path2 It's clearly misconfigured, but our docs don't state what we do in this situation. If one checks this with "test-tool submodule-config", you'd see that we ignore every value after the first (aka first-one-wins) and issue a warning. *But* if you actually tried this with "git submodule", you'd find it practically impossible to trigger this behavior - what you actually see is last-one-wins! The difference between the two is somewhat complicated because there are two factors at play. The first is a probable bug in how parse_config_parameter.overwrite affects the way submodule config is cached. In submodule-config.c:parse_config(), when ".overwrite = 1", the submodule machinery gladly overwrites the existing value (last-one-wins) instead of issuing the warning (first-one-wins). This is probably a bug because it seems that .overwrite is intended to overwrite cached values from a previous .gitmodules (e.g. if we read .gitmodules from the index and want to overwrite it with a newer version), not to overwrite values that we read in the same file. The second factor is that the two are reading differently cached values: "git submodule" is reading cached values with ".overwrite = 1", but test-tool is reading from cached values with ".overwrite = 0". The submodule cache stores each submodule config based on the submodule name and the .gitmodules oid it was read from (null_oid() if it's from the working tree). Both code paths eventually call repo_read_gitmodules() to eagerly cache .gitmodules from the working tree, and which happens to use ".overwrite = 1". "git submodule" typically passes null_oid(), which reads back this value. However, test-tool reads back values from the actual .gitmodules oid. This causes a cache miss, but when we try to populate the cache at that oid, we do it with ".overwrite = 0", causing the difference in behavior. To make this behavior easy to demonstrate, I've opted to teach test-tool how to use null_oid() rather than use a real Git command, but this is almost certainly affecting us in real Git. It's probably flying under the radar due to some combination of submodule_from_[path|name]() being relatively uncommon in the codebase, and such misconfigurations being rare in practice. We could fix this bug by targeting either of these factors: - Make ".overwrite = 1" and ".overwrite = 0" do the same thing with repeated values in a .gitmodules. - Remove the eager caching (repo_read_gitmodules()). It seems like we can lazily populate the cache on a miss, so we might not need this. But I'm not sure how long this has been around, and whether users have come to expect one over the other, so I've opted not to send a fix yet. Signed-off-by: Glen Choo <chooglen@xxxxxxxxxx> --- submodule: show inconsistent .gitmodules precedence While I was writing a .gitmodules parser for jj (https://github.com/martinvonz/jj, check it out, it's great!), a reviewer asked what would happen if a submodule had repeated fields (like .path). It turns out that the answer isn't just undefined (it's nowhere in the docs), it's inconsistent! Here's a bug report patch that demonstrates the issue using test-tool. I've stopped short of sending a fix because 1) I'm frankly not sure what behavior users have come to rely on 2) this problem is multi-faceted, so we could fix this in quite a few ways, but I'm not sure which way is 'right'. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1538%2Fchooglen%2Fpush-lzmyuzkpxxxq-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1538/chooglen/push-lzmyuzkpxxxq-v1 Pull-Request: https://github.com/git/git/pull/1538 submodule-config.c | 7 +++++++ t/helper/test-submodule-config.c | 22 +++++++++++++++++++++- t/t7411-submodule-config.sh | 22 ++++++++++++++++++++++ 3 files changed, 50 insertions(+), 1 deletion(-) diff --git a/submodule-config.c b/submodule-config.c index 7eb7a0d88d2..1c4b5afa8e4 100644 --- a/submodule-config.c +++ b/submodule-config.c @@ -441,6 +441,13 @@ static int parse_config(const char *var, const char *value, void *data) me->gitmodules_oid, name.buf); + /* + * FIXME me->overwrite=1 is only meant to overwrite existing submodule + * configurations when we're reading from another .gitmodules (e.g. from + * another commit), but it also unintentionally changes behavior when + * there are multiple configurations in a single .gitmodules - instead + * of respecting the first value, we now respect the last value. + */ if (!strcmp(item.buf, "path")) { if (!value) ret = config_error_nonbool(var); diff --git a/t/helper/test-submodule-config.c b/t/helper/test-submodule-config.c index 9df2f03ac80..1bb1dc45878 100644 --- a/t/helper/test-submodule-config.c +++ b/t/helper/test-submodule-config.c @@ -29,11 +29,31 @@ int cmd__submodule_config(int argc, const char **argv) my_argc--; } - if (my_argc % 2 != 0) + if (my_argc > 1 && my_argc % 2 != 0) die_usage(argc, argv, "Wrong number of arguments."); setup_git_directory(); + if (my_argc == 1) { + const struct submodule *submodule; + const char *path_or_name; + + path_or_name = arg[0]; + if (lookup_name) { + submodule = submodule_from_name(the_repository, + null_oid(), path_or_name); + } else + submodule = submodule_from_path(the_repository, + null_oid(), path_or_name); + if (!submodule) + die_usage(argc, argv, "Submodule not found."); + + printf("Submodule name: '%s' for path '%s'\n", submodule->name, + submodule->path); + + return 0; + } + while (*arg) { struct object_id commit_oid; const struct submodule *submodule; diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh index c0167944abd..b67eea7e085 100755 --- a/t/t7411-submodule-config.sh +++ b/t/t7411-submodule-config.sh @@ -258,4 +258,26 @@ test_expect_success 'reading nested submodules config when .gitmodules is not in ) ' +test_expect_success 'multiple config fields in .gitmodules' ' + test_when_finished "rm -fr super-duplicate" && + mkdir super-duplicate && + (cd super-duplicate && + git init && + git submodule add ../submodule && + git config --file .gitmodules --add submodule.submodule.path ignored && + git config --file .gitmodules --add submodule.submodule.url ignored && + git add .gitmodules && + git commit -m "duplicate fields in .gitmodules" && + test-tool submodule-config HEAD submodule >actual 2>warning && + grep "Skipping second one" warning && + echo "Submodule name: ${SQ}submodule${SQ} for path ${SQ}submodule${SQ}" >expect && + test_cmp expect actual && + # FIXME this should give the same result as "HEAD", but there is + # a bug where if we use null_oid() instead of the real commit + # id, the second .path gets respected instead of the first. + test_must_fail test-tool submodule-config submodule 2>null-oid-error && + grep "Submodule not found" null-oid-error + ) +' + test_done base-commit: 6ff334181cfb6485d3ba50843038209a2a253907 -- gitgitgadget