Re: [PATCH v2] submodules: fix of regression on fetching of non-init subsub-repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Peter,

> Le 7 déc. 2020 à 08:46, Peter Kaestle <peter.kaestle@xxxxxxxxx> a écrit :
> 
> A regression has been introduced by a62387b (submodule.c: fetch in
> submodules git directory instead of in worktree, 2018-11-28).
> 
> The scenario in which it triggers is when one has a remote repository
> with a subrepository inside a subrepository like this:
> superproject/middle_repo/inner_repo

The correct terminology is "submodule", not "subrepository". 

Also, (minor point) I would just write "when one has a repository", 
as its simpler (the repository by itself is not "remote", it is only "remote" 
in relation the repositories that are cloned from it).

> Person A and B have both a clone of it, while Person B is not working
> with the inner_repo and thus does not have it initialized in his working
> copy.
> 
> Now person A introduces a change to the inner_repo and propagates it
> through the middle_repo and the superproject.
> 
> Once person A pushed the changes and person B wants to fetch them using
> "git fetch" on superproject level,

s/on/at the/

> B's git call will return with error
> saying:
> 
> Could not access submodule 'inner_repo'
> Errors during submodule fetch:
>         middle_repo
> 
> Expectation is that in this case the inner submodule will be recognized
> as uninitialized subrepository and skipped by the git fetch command.

here again, terminology: "as an uninitialized submodule" 

> This used to work correctly before 'a62387b (submodule.c: fetch in
> submodules git directory instead of in worktree, 2018-11-28)'.
> 
> Starting with a62387b the code wants to evaluate "is_empty_dir()" inside
> .git/modules for a directory only existing in the worktree, delivering
> then of course wrong return value.
> 
> This patch ensures is_empty_dir() is getting the correct path of the
> uninitialized submodule by concatenation of the actual worktree and the
> name of the uninitialized submodule.
> 
> Furthermore a regression test case is added, which tests for recursive
> fetches on a superproject with uninitialized sub repositories.
>  This
> issue was leading to an infinite loop when doing a revert of a62387b.

I would maybe add more details here, something like the following 
(we can cite your previous attempt, because it was merged to 'master'):

The first attempt to fix this regression, in 1b7ac4e6d4 (submodules: 
fix of regression on fetching of non-init subsub-repo, 2020-11-12), by simply
reverting a62387b, resulted in
an infinite loop of submodule fetches in the simpler case of a recursive fetch of a superproject with
uninitialized submodules, and so this commit was reverted in 7091499bc0 (Revert 
"submodules: fix of regression on fetching of non-init subsub-repo", 2020-12-02).
To prevent future breakages, also add a regression test for this scenario.

> 
> Signed-off-by: Peter Kaestle <peter.kaestle@xxxxxxxxx>
> CC: Junio C Hamano <gitster@xxxxxxxxx>
> CC: Philippe Blain <levraiphilippeblain@xxxxxxxxx>
> CC: Ralf Thielow <ralf.thielow@xxxxxxxxx>
> CC: Eric Sunshine <sunshine@xxxxxxxxxxxxxx>
> ---
> submodule.c                 |   7 ++-
> t/t5526-fetch-submodules.sh | 104 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 110 insertions(+), 1 deletion(-)
> 
> diff --git a/submodule.c b/submodule.c
> index b3bb59f066..b561445329 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1477,6 +1477,7 @@ static int get_next_submodule(struct child_process *cp,
> 			strbuf_release(&submodule_prefix);
> 			return 1;
> 		} else {
> +			struct strbuf empty_submodule_path = STRBUF_INIT;
> 
> 			fetch_task_release(task);
> 			free(task);
> @@ -1485,13 +1486,17 @@ static int get_next_submodule(struct child_process *cp,
> 			 * An empty directory is normal,
> 			 * the submodule is not initialized
> 			 */
> +			strbuf_addf(&empty_submodule_path, "%s/%s/",
> +							spf->r->worktree,
> +							ce->name);
> 			if (S_ISGITLINK(ce->ce_mode) &&
> -			    !is_empty_dir(ce->name)) {
> +			    !is_empty_dir(empty_submodule_path.buf)) {
> 				spf->result = 1;
> 				strbuf_addf(err,
> 					    _("Could not access submodule '%s'\n"),
> 					    ce->name);
> 			}
> +			strbuf_release(&empty_submodule_path);
> 		}
> 	}


Maybe a personal preference, but I would have gone for something a little simpler, like the following:


diff --git a/submodule.c b/submodule.c
index b3bb59f066..4200865174 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1486,7 +1486,7 @@ static int get_next_submodule(struct child_process *cp,
                         * the submodule is not initialized
                         */
                        if (S_ISGITLINK(ce->ce_mode) &&
-                           !is_empty_dir(ce->name)) {
+                           !is_empty_dir(repo_worktree_path(spf->r, "%s", ce->name))) {
                                spf->result = 1;
                                strbuf_addf(err,
                                            _("Could not access submodule '%s'\n"),

> diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> index dd8e423d25..666dd1e2b7 100755
> --- a/t/t5526-fetch-submodules.sh
> +++ b/t/t5526-fetch-submodules.sh
> @@ -719,4 +719,108 @@ test_expect_success 'fetch new submodule commit intermittently referenced by sup
> 	)
> '
> 
> +add_commit_push () {
> +	dir="$1" &&
> +	msg="$2" &&
> +	shift 2 &&
> +	git -C "$dir" add "$@" &&
> +	git -C "$dir" commit -a -m "$msg" &&
> +	git -C "$dir" push
> +}
> +
> +compare_refs_in_dir () {
> +	fail= &&
> +	if test "x$1" = 'x!'
> +	then
> +		fail='!' &&
> +		shift
> +	fi &&
> +	git -C "$1" rev-parse --verify "$2" >expect &&
> +	git -C "$3" rev-parse --verify "$4" >actual &&
> +	eval $fail test_cmp expect actual
> +}
> +
> +
> +test_expect_success 'setup nested submodule fetch test' '
> +	# does not depend on any previous test setups
> +
> +	for repo in outer middle inner
> +	do
> +		git init --bare $repo &&
> +		git clone $repo ${repo}_content &&
> +		echo "$repo" >"${repo}_content/file" &&
> +		add_commit_push ${repo}_content "initial" file ||
> +		return 1
> +	done &&
> +
> +	git clone outer A &&
> +	git -C A submodule add "$pwd/middle" &&
> +	git -C A/middle/ submodule add "$pwd/inner" &&
> +	add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
> +	add_commit_push A/ "adding middle sub" .gitmodules middle &&
> +
> +	git clone outer B &&
> +	git -C B/ submodule update --init middle &&
> +
> +	compare_refs_in_dir A HEAD B HEAD &&
> +	compare_refs_in_dir A/middle HEAD B/middle HEAD &&
> +	test_path_is_file B/file &&
> +	test_path_is_file B/middle/file &&
> +	test_path_is_missing B/middle/inner/file &&
> +
> +	echo "change on inner repo of A" >"A/middle/inner/file" &&
> +	add_commit_push A/middle/inner "change on inner" file &&
> +	add_commit_push A/middle "change on inner" inner &&
> +	add_commit_push A "change on inner" middle
> +'
> +
> +test_expect_success 'fetching a superproject containing an uninitialized sub/sub project' '
> +	# depends on previous test for setup
> +
> +	git -C B/ fetch &&
> +	compare_refs_in_dir A origin/master B origin/master
> +'
> +
> +
> +test_expect_success 'setup recursive fetch with uninit submodule' '
> +	# does not depend on any previous test setups
> +
> +	git init main &&
> +	git init sub &&
> +
> +	>sub/file &&
> +	git -C sub add file &&
> +	git -C sub commit -m "add file" &&
> +	git -C sub rev-parse HEAD >expect &&
> +
> +	git -C main submodule add ../sub &&
> +	git -C main submodule init &&
> +	git -C main submodule update --checkout &&

These two steps are unnecessary as they are implicitly done by 'git submodule add'.
I think we could reflect real life a little bit more by cloning the superproject, and running
the 'recursive fetch with uninit submodule' test below in the clone.

> +	git -C main submodule status >out &&
> +	sed -e "s/^ //" -e "s/ sub .*$//" out >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'recursive fetch with uninit submodule' '
> +	# depends on previous test for setup
> +
> +	git -C main submodule deinit -f sub &&

Here you are deiniting the submodule, such that 
the Git directory will stay in .git/modules/sub. This is not the same thing
as a submodule that was never initialized ("uninitialized"), for which .git/modules/sub
will not yet exist. So maybe we could harden the tests by also testing
for that scenario ? I don't know... maybe the infinite loop only happens
if .git/modules/sub actually already exists. If so, the test name should be
"recursive fetch with deinitialized submodule", I think.

> +
> +	# In a regression the following git call will run into infinite recursion.
> +	# To handle that, we connect the grep command to the git call by a pipe
> +	# so that grep can kill the infinite recusion when detected.
> +	# The recursion creates git output like:
> +	# Fetching submodule sub
> +	# Fetching submodule sub/sub              <-- [1]
> +	# Fetching submodule sub/sub/sub
> +	# ...
> +	# [1] grep will trigger here and kill git by exiting and closing its stdin
> +
> +	! git -C main fetch --recurse-submodules 2>&1 |
> +		grep -v -m1 "Fetching submodule sub$" &&
> +	git -C main submodule status >out &&
> +	sed -e "s/^-//" -e "s/ sub$//" out >actual &&
> +	test_cmp expect actual
> +'
> +
> test_done


Thanks for working on that, and sorry for not having the time to comment before
you sent v2.

Cheers,

Philippe.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux