Original cover letter: https://lore.kernel.org/git/20220210044152.78352-1-chooglen@xxxxxxxxxx This round of patches is now based on master - I prepared the previous rounds on top of gc/branch-recurse-submodules, but now that's merged onto master (and the actual branch for this series, gc/recursive-fetch-with-unused-submodules, is based off master anyway). This round fixes up the comments from the previous round (thanks everyone!), all of which are fairly small. = Patch organization - Patches 1-3 are quality-of-life improvements to the test suite that make it easier to write the tests in patch 9. - Patches 4-6 are preparation for "git fetch" to read .gitmodules from the superproject commit in patch 7. - Patches 7-8 refactor out the logic of "finding which submodules to fetch" and "fetching the submodules", making it easier to tell "git fetch" to fetch unpopulated submodules. - Patch 9 teaches "git fetch" to fetch changed, unpopulated submodules in addition to populated submodules. - Patch 10 is an optional bugfix + cleanup of the "git fetch" code that removes the last caller of the deprecated "add_submodule_odb()". = Changes == Since v3 - Numerous style fixes + improved comments. - Fix sed portability issues. - Fix failing test due to default branch name assumptions. - Patch 3: change a test so that it no longer depends on state from the previous test. - Patch 9: fix memory leak when recording super_oid and path + add explanatory comment. == Since v2 - Numerous small fixes to the code and commit message (thanks to all who helped spot these :)) - In patch 2, use test_cmp + sed to assert on test output, effectively reverting the "use grep" approach of v1-2 (see patch 2's description). - New patch 3: introduce a test helper that creates the expected superproject commit (instead of copy-pasting the code over and over). - I did not get rid of "git fetch" inside the test helper (as Jonathan suggested) though, because that requires a bigger change in the test setup, and I think the test helper makes the test straightforward enough. - New patch 8: refactor some shared logic out into fetch_task_create(). This reduces code duplication between the get_fetch_task_from_* functions. - In patch 9, add additional tests for 'submodules with the same name'. - In patch 9, handle a bug where a submodule that is unpopulated by "git rm" still has "core.worktree" set and cannot be fetched (see patch 9's description). - Remove the "git fetch --update-shallow" patch (I'll try to send it separately). == Since v1 - Numerous style fixes suggested by Jonathan (thanks!) - In patch 3, don't prematurely read submodules from the superproject commit (see: <kl6l5yplyat6.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>). - In patch 7, stop using "git checkout" and "! grep" in tests. - In patch 7, stop doing the "find changed submodules" rev walk unconditionally. Instead, continue to check for .gitmodules, but also check for submodules in $GIT_DIR/modules. - I'm not entirely happy with the helper function name, see "---" for details. - Move "git fetch --update-shallow" bugfix to patch 8. - Because the "find changed submodules" rev walk is no longer unconditional, this fix is no longer needed for tests to pass. - Rename fetch_populated_submodules() to fetch_submodules(). Glen Choo (10): t5526: introduce test helper to assert on fetches t5526: stop asserting on stderr literally t5526: create superproject commits with test helper submodule: make static functions read submodules from commits submodule: inline submodule_commits() into caller submodule: store new submodule commits oid_array in a struct submodule: extract get_fetch_task() submodule: move logic into fetch_task_create() fetch: fetch unpopulated, changed submodules submodule: fix latent check_has_commit() bug Documentation/fetch-options.txt | 26 +- Documentation/git-fetch.txt | 10 +- builtin/fetch.c | 14 +- submodule.c | 442 +++++++++++++++++--------- submodule.h | 21 +- t/t5526-fetch-submodules.sh | 539 ++++++++++++++++++++++++-------- 6 files changed, 740 insertions(+), 312 deletions(-) Range-diff against v3: 1: b6d34b0f5c ! 1: 57cd31afc2 t5526: introduce test helper to assert on fetches @@ t/t5526-fetch-submodules.sh: add_upstream_commit() { +verify_fetch_result() { + ACTUAL_ERR=$1 && + rm -f expect.err.combined && -+ if [ -f expect.err.super ]; then ++ if test -f expect.err.super ++ then + cat expect.err.super >>expect.err.combined + fi && -+ if [ -f expect.err.sub ]; then ++ if test -f expect.err.sub ++ then + cat expect.err.sub >>expect.err.combined + fi && -+ if [ -f expect.err.deep ]; then ++ if test -f expect.err.deep ++ then + cat expect.err.deep >>expect.err.combined + fi && + test_cmp expect.err.combined $ACTUAL_ERR 2: 0b85fa35c2 ! 2: b70c894cff t5526: stop asserting on stderr literally @@ t/t5526-fetch-submodules.sh: export GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB pwd=$(pwd) -+check_sub() { ++check_sub () { + NEW_HEAD=$1 && -+ cat <<-EOF >$pwd/expect.err.sub ++ cat >$pwd/expect.err.sub <<-EOF + Fetching submodule submodule + From $pwd/submodule + OLD_HEAD..$NEW_HEAD sub -> origin/sub + EOF +} + -+check_deep() { ++check_deep () { + NEW_HEAD=$1 && -+ cat <<-EOF >$pwd/expect.err.deep ++ cat >$pwd/expect.err.deep <<-EOF + Fetching submodule submodule/subdir/deepsubmodule + From $pwd/deepsubmodule + OLD_HEAD..$NEW_HEAD deep -> origin/deep + EOF +} + -+check_super() { ++check_super () { + NEW_HEAD=$1 && -+ cat <<-EOF >$pwd/expect.err.super ++ cat >$pwd/expect.err.super <<-EOF + From $pwd/. + OLD_HEAD..$NEW_HEAD super -> origin/super + EOF @@ t/t5526-fetch-submodules.sh: pwd=$(pwd) ) } +@@ t/t5526-fetch-submodules.sh: add_upstream_commit() { + # + # If a repo should not be fetched in the test, its corresponding + # expect.err file should be rm-ed. +-verify_fetch_result() { ++verify_fetch_result () { + ACTUAL_ERR=$1 && + rm -f expect.err.combined && + if test -f expect.err.super @@ t/t5526-fetch-submodules.sh: verify_fetch_result() { - if [ -f expect.err.deep ]; then + then cat expect.err.deep >>expect.err.combined fi && - test_cmp expect.err.combined $ACTUAL_ERR -+ sed -E 's/[0-9a-f]+\.\./OLD_HEAD\.\./' $ACTUAL_ERR >actual.err.cmp && ++ sed -e 's/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./' "$ACTUAL_ERR" >actual.err.cmp && + test_cmp expect.err.combined actual.err.cmp } 3: bb8ef6094a ! 3: 7e2a01164e t5526: create superproject commits with test helper @@ Commit message Signed-off-by: Glen Choo <chooglen@xxxxxxxxxx> ## t/t5526-fetch-submodules.sh ## -@@ t/t5526-fetch-submodules.sh: check_super() { +@@ t/t5526-fetch-submodules.sh: check_super () { # a file that contains the expected err if that new commit were fetched. # These output files get concatenated in the right order by # verify_fetch_result(). -add_upstream_commit() { -+add_submodule_commits() { ++add_submodule_commits () { ( cd submodule && echo new >> subfile && @@ t/t5526-fetch-submodules.sh: add_upstream_commit() { +# +# This requires add_submodule_commits() to be called first, otherwise +# the submodules will not have changed and cannot be "git add"-ed. -+add_superproject_commits() { -+( -+ cd submodule && ++add_superproject_commits () { + ( -+ cd subdir/deepsubmodule && -+ git fetch && -+ git checkout -q FETCH_HEAD -+ ) && ++ cd submodule && ++ ( ++ cd subdir/deepsubmodule && ++ git fetch && ++ git checkout -q FETCH_HEAD ++ ) && + git add subdir/deepsubmodule && + git commit -m "new deep submodule" + ) && 4: e83a1713c4 = 4: 88112ee225 submodule: make static functions read submodules from commits 5: e27d402b9a = 5: 007cd97aba submodule: inline submodule_commits() into caller 6: 1c7c8218b8 = 6: f34ea88fe9 submodule: store new submodule commits oid_array in a struct 7: 80cf317722 ! 7: f66ab663c5 submodule: extract get_fetch_task() @@ submodule.c: struct fetch_task { struct repository *repo; const struct submodule *sub; unsigned free_sub : 1; /* Do we need to free the submodule? */ -+ const char *default_argv; ++ const char *default_argv; /* The default fetch mode. */ struct oid_array *commits; /* Ensure these commits are fetched */ }; 8: bf9cfa7054 = 8: 4e3db1bc9d submodule: move logic into fetch_task_create() 9: c7c2ff71b6 ! 9: 9e7b1c1bbe fetch: fetch unpopulated, changed submodules @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix) ## submodule.c ## @@ submodule.c: static const char *default_name_or_path(const char *path_or_name) - * member of the changed submodule string_list_item. + + /* + * Holds relevant information for a changed submodule. Used as the .util +- * member of the changed submodule string_list_item. ++ * member of the changed submodule name string_list_item. ++ * ++ * (super_oid, path) allows the submodule config to be read from _some_ ++ * .gitmodules file. We store this information the first time we find a ++ * superproject commit that points to the submodule, but this is ++ * arbitrary - we can choose any (super_oid, path) that matches the ++ * submodule's name. */ struct changed_submodule_data { + /* -+ * The first superproject commit in the rev walk that points to the -+ * submodule. ++ * The first superproject commit in the rev walk that points to ++ * the submodule. + */ + const struct object_id *super_oid; + /* @@ submodule.c: struct changed_submodule_data { static void collect_changed_submodules_cb(struct diff_queue_struct *q, @@ submodule.c: static void collect_changed_submodules_cb(struct diff_queue_struct *q, - if (!item->util) + continue; + + item = string_list_insert(changed, name); +- if (!item->util) ++ if (item->util) ++ cs_data = item->util; ++ else { item->util = xcalloc(1, sizeof(struct changed_submodule_data)); - cs_data = item->util; -+ cs_data->super_oid = commit_oid; -+ cs_data->path = xstrdup(p->two->path); +- cs_data = item->util; ++ cs_data = item->util; ++ cs_data->super_oid = commit_oid; ++ cs_data->path = xstrdup(p->two->path); ++ } oid_array_append(&cs_data->new_commits, &p->two->oid); } } @@ submodule.c: void check_for_new_submodule_commits(struct object_id *oid) + */ +static int repo_has_absorbed_submodules(struct repository *r) +{ ++ int ret; + struct strbuf buf = STRBUF_INIT; + + strbuf_repo_git_path(&buf, r, "modules/"); -+ return file_exists(buf.buf) && !is_empty_dir(buf.buf); ++ ret = file_exists(buf.buf) && !is_empty_dir(buf.buf); ++ strbuf_release(&buf); ++ return ret; +} + static void calculate_changed_submodule_paths(struct repository *r, @@ submodule.c: int submodule_touches_in_range(struct repository *r, struct submodule_parallel_fetch { - int count; ++ /* ++ * The index of the last index entry processed by ++ * get_fetch_task_from_index(). ++ */ + int index_count; ++ /* ++ * The index of the last string_list entry processed by ++ * get_fetch_task_from_changed(). ++ */ + int changed_count; struct strvec args; struct repository *r; const char *prefix; @@ submodule.c: struct submodule_parallel_fetch { + int quiet; int result; ++ /* ++ * Names of submodules that have new commits. Generated by ++ * walking the newly fetched superproject commits. ++ */ struct string_list changed_submodule_names; ++ /* ++ * Names of submodules that have already been processed. Lets us ++ * avoid fetching the same submodule more than once. ++ */ + struct string_list seen_submodule_names; /* Pending fetches by OIDs */ @@ submodule.c: struct submodule_parallel_fetch { @@ submodule.c: struct fetch_task { const struct submodule *sub; unsigned free_sub : 1; /* Do we need to free the submodule? */ - const char *default_argv; -+ struct strvec git_args; + const char *default_argv; /* The default fetch mode. */ ++ struct strvec git_args; /* Args for the child git process. */ struct oid_array *commits; /* Ensure these commits are fetched */ }; @@ submodule.c: get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf + + spf->changed_count++; + /* -+ * NEEDSWORK: A submodule unpopulated by "git rm" will -+ * have core.worktree set, but the actual core.worktree -+ * directory won't exist, causing the child process to -+ * fail. Forcibly set --work-tree until we get smarter -+ * handling for core.worktree in unpopulated submodules. ++ * NEEDSWORK: Submodules set/unset a value for ++ * core.worktree when they are populated/unpopulated by ++ * "git checkout" (and similar commands, see ++ * submodule_move_head() and ++ * connect_work_tree_and_git_dir()), but if the ++ * submodule is unpopulated in another way (e.g. "git ++ * rm", "rm -r"), core.worktree will still be set even ++ * though the directory doesn't exist, and the child ++ * process will crash while trying to chdir into the ++ * nonexistent directory. ++ * ++ * In this case, we know that the submodule has no ++ * working tree, so we can work around this by ++ * setting "--work-tree=." (--bare does not work because ++ * worktree settings take precedence over bare-ness). ++ * However, this is not necessarily true in other cases, ++ * so a generalized solution is still necessary. ++ * ++ * Possible solutions: ++ * - teach "git [add|rm]" to unset core.worktree and ++ * discourage users from removing submodules without ++ * using a Git command. ++ * - teach submodule child processes to ignore stale ++ * core.worktree values. + */ + strvec_push(&task->git_args, "--work-tree=."); + return task; @@ submodule.h: int should_update_submodules(void); ## t/t5526-fetch-submodules.sh ## @@ t/t5526-fetch-submodules.sh: pwd=$(pwd) - check_sub() { + check_sub () { NEW_HEAD=$1 && + SUPER_HEAD=$2 && - cat <<-EOF >$pwd/expect.err.sub + cat >$pwd/expect.err.sub <<-EOF - Fetching submodule submodule + Fetching submodule submodule${SUPER_HEAD:+ at commit $SUPER_HEAD} From $pwd/submodule OLD_HEAD..$NEW_HEAD sub -> origin/sub EOF -@@ t/t5526-fetch-submodules.sh: check_sub() { +@@ t/t5526-fetch-submodules.sh: check_sub () { - check_deep() { + check_deep () { NEW_HEAD=$1 && + SUB_HEAD=$2 && - cat <<-EOF >$pwd/expect.err.deep + cat >$pwd/expect.err.deep <<-EOF - Fetching submodule submodule/subdir/deepsubmodule + Fetching submodule submodule/subdir/deepsubmodule${SUB_HEAD:+ at commit $SUB_HEAD} From $pwd/deepsubmodule OLD_HEAD..$NEW_HEAD deep -> origin/deep EOF +@@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-demand' doesn't recurse when no ne + ' + + test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necessary (and ignores config)" ' ++ add_submodule_commits && + add_superproject_commits && + ( + cd downstream && @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-demand' recurses as deep as necess verify_fetch_result actual.err ' @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman + # Use test_cmp manually because verify_fetch_result does not + # consider submodule2. All the repos should be fetched, but only + # submodule2 should be read from a commit -+ cat <<-EOF > expect.err.combined && ++ cat > expect.err.combined <<-EOF && + From $pwd/. + OLD_HEAD..$super_head super -> origin/super + OLD_HEAD..$super_sub2_only_head super-sub2-only -> origin/super-sub2-only @@ t/t5526-fetch-submodules.sh: test_expect_success "'--recurse-submodules=on-deman + From $pwd/submodule2 + OLD_HEAD..$sub2_head sub2 -> origin/sub2 + EOF -+ sed -E "s/[0-9a-f]+\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp && ++ sed -e "s/[0-9a-f][0-9a-f]*\.\./OLD_HEAD\.\./" actual.err >actual.err.cmp && + test_cmp expect.err.combined actual.err.cmp +' + @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a + mkdir same-name-1 && + ( + cd same-name-1 && -+ git init && ++ git init -b main && + test_commit --no-tag a + ) && + git clone same-name-1 same-name-2 && @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a + ( + cd same-name-1 && + mkdir submodule && -+ git -C submodule init && ++ git -C submodule init -b main && + test_commit -C submodule --no-tag a1 && + git submodule add "$pwd/same-name-1/submodule" && + git add submodule && @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a + ( + cd same-name-2 && + mkdir submodule && -+ git -C submodule init && ++ git -C submodule init -b main && + test_commit -C submodule --no-tag a2 && + git submodule add "$pwd/same-name-2/submodule" && + git add submodule && @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a +' + +test_expect_success 'fetch --recurse-submodules updates name-conflicted, populated submodule' ' -+ test_when_finished "git -C same-name-downstream checkout master" && ++ test_when_finished "git -C same-name-downstream checkout main" && + ( + cd same-name-1 && + test_commit -C submodule --no-tag b1 && @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a + cd same-name-downstream && + # even though the .gitmodules is correct, we cannot + # fetch from same-name-2 -+ git checkout same-name-2/master && ++ git checkout same-name-2/main && + git fetch --recurse-submodules same-name-1 && + test_must_fail git fetch --recurse-submodules same-name-2 + ) && @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a + ) && + ( + cd same-name-downstream && -+ git checkout master && ++ git checkout main && + git rm .gitmodules && + git rm submodule && + git commit -m "no submodules" && @@ t/t5526-fetch-submodules.sh: test_expect_success 'recursive fetch after deinit a + ( + cd same-name-downstream/.git/modules/submodule && + # The submodule has core.worktree pointing to the "git -+ # rm"-ed directory, overwrite the invalid value. ++ # rm"-ed directory, overwrite the invalid value. See ++ # comment in get_fetch_task_from_changed() for more ++ # information. + git --work-tree=. cat-file -e $head1 && + test_must_fail git --work-tree=. cat-file -e $head2 + ) 10: e1ac74eee4 = 10: 362ce3c7f8 submodule: fix latent check_has_commit() bug base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7 -- 2.33.GIT