Re: [PATCH v2] submodules: fix of regression on fetching of non-init subsub-repo

Peter Kästle <peter.kaestle@xxxxxxxxx> · Tue, 8 Dec 2020 15:06:08 +0100

On 07.12.20 19:42, Philippe Blain wrote:
Hi Peter,

Le 7 déc. 2020 à 08:46, Peter Kaestle <peter.kaestle@xxxxxxxxx> a écrit :

A regression has been introduced by a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28).

The scenario in which it triggers is when one has a remote repository
with a subrepository inside a subrepository like this:
superproject/middle_repo/inner_repo

The correct terminology is "submodule", not "subrepository".

Also, (minor point) I would just write "when one has a repository",
as its simpler (the repository by itself is not "remote", it is only "remote"
in relation the repositories that are cloned from it).

ok.



Person A and B have both a clone of it, while Person B is not working
with the inner_repo and thus does not have it initialized in his working
copy.

Now person A introduces a change to the inner_repo and propagates it
through the middle_repo and the superproject.

Once person A pushed the changes and person B wants to fetch them using
"git fetch" on superproject level,

s/on/at the/

ok.


B's git call will return with error
saying:

Could not access submodule 'inner_repo'
Errors during submodule fetch:
         middle_repo

Expectation is that in this case the inner submodule will be recognized
as uninitialized subrepository and skipped by the git fetch command.

here again, terminology: "as an uninitialized submodule"

ok.


This used to work correctly before 'a62387b (submodule.c: fetch in
submodules git directory instead of in worktree, 2018-11-28)'.

Starting with a62387b the code wants to evaluate "is_empty_dir()" inside
.git/modules for a directory only existing in the worktree, delivering
then of course wrong return value.

This patch ensures is_empty_dir() is getting the correct path of the
uninitialized submodule by concatenation of the actual worktree and the
name of the uninitialized submodule.

Furthermore a regression test case is added, which tests for recursive
fetches on a superproject with uninitialized sub repositories.
  This
issue was leading to an infinite loop when doing a revert of a62387b.

I would maybe add more details here, something like the following
(we can cite your previous attempt, because it was merged to 'master'):

The first attempt to fix this regression, in 1b7ac4e6d4 (submodules:
fix of regression on fetching of non-init subsub-repo, 2020-11-12), by simply
reverting a62387b, resulted in
an infinite loop of submodule fetches in the simpler case of a recursive fetch of a superproject with
uninitialized submodules, and so this commit was reverted in 7091499bc0 (Revert
"submodules: fix of regression on fetching of non-init subsub-repo", 2020-12-02).
To prevent future breakages, also add a regression test for this scenario.

Jip, I like that.



Signed-off-by: Peter Kaestle <peter.kaestle@xxxxxxxxx>
CC: Junio C Hamano <gitster@xxxxxxxxx>
CC: Philippe Blain <levraiphilippeblain@xxxxxxxxx>
CC: Ralf Thielow <ralf.thielow@xxxxxxxxx>
CC: Eric Sunshine <sunshine@xxxxxxxxxxxxxx>
---
submodule.c                 |   7 ++-
t/t5526-fetch-submodules.sh | 104 ++++++++++++++++++++++++++++++++++++
2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index b3bb59f066..b561445329 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1477,6 +1477,7 @@ static int get_next_submodule(struct child_process *cp,
			strbuf_release(&submodule_prefix);
			return 1;
		} else {
+			struct strbuf empty_submodule_path = STRBUF_INIT;

			fetch_task_release(task);
			free(task);
@@ -1485,13 +1486,17 @@ static int get_next_submodule(struct child_process *cp,
			 * An empty directory is normal,
			 * the submodule is not initialized
			 */
+			strbuf_addf(&empty_submodule_path, "%s/%s/",
+							spf->r->worktree,
+							ce->name);
			if (S_ISGITLINK(ce->ce_mode) &&
-			    !is_empty_dir(ce->name)) {
+			    !is_empty_dir(empty_submodule_path.buf)) {
				spf->result = 1;
				strbuf_addf(err,
					    _("Could not access submodule '%s'\n"),
					    ce->name);
			}
+			strbuf_release(&empty_submodule_path);
		}
	}


Maybe a personal preference, but I would have gone for something a little simpler, like the following:


diff --git a/submodule.c b/submodule.c
index b3bb59f066..4200865174 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1486,7 +1486,7 @@ static int get_next_submodule(struct child_process *cp,
                          * the submodule is not initialized
                          */
                         if (S_ISGITLINK(ce->ce_mode) &&
-                           !is_empty_dir(ce->name)) {
+                           !is_empty_dir(repo_worktree_path(spf->r, "%s", ce->name))) {

I'm not deep enough into the git code to judge which approach is the 
better one.  From my perspective, being a foreigner to the git code, I 
like my proposed code more, as for me it's much easier to understand 
what's happening by having a meaningful variable name and without being 
forced to dig into outer functions first.
Also Junio C Hamano <gitster@xxxxxxxxx> is having some concerns, which I 
can't judge:

But then you leak the return value from repo_worktree_path(), no?

Thus for v3 I'll stick to my proposal and when you'll review it, please 
discuss with each other whether I should go for a v4 using 
repo_worktree_path().

[...]

+
+test_expect_success 'setup recursive fetch with uninit submodule' '
+	# does not depend on any previous test setups
+
+	git init main &&
+	git init sub &&
+
+	>sub/file &&
+	git -C sub add file &&
+	git -C sub commit -m "add file" &&
+	git -C sub rev-parse HEAD >expect &&
+
+	git -C main submodule add ../sub &&
+	git -C main submodule init &&
+	git -C main submodule update --checkout &&

These two steps are unnecessary as they are implicitly done by 'git submodule add'.
I think we could reflect real life a little bit more by cloning the superproject, and running
the 'recursive fetch with uninit submodule' test below in the clone.

Yes, you're right, "...init" and "...update..." can be removed.


+	git -C main submodule status >out &&
+	sed -e "s/^ //" -e "s/ sub .*$//" out >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'recursive fetch with uninit submodule' '
+	# depends on previous test for setup
+
+	git -C main submodule deinit -f sub &&

Here you are deiniting the submodule, such that
the Git directory will stay in .git/modules/sub. This is not the same thing
as a submodule that was never initialized ("uninitialized"), for which .git/modules/sub
will not yet exist. So maybe we could harden the tests by also testing
for that scenario ? I don't know... maybe the infinite loop only happens
if .git/modules/sub actually already exists. If so, the test name should be
"recursive fetch with deinitialized submodule", I think.

I added another test case for v3, which checks for this in case of never 
initialized submodule.  When executing the test, I can see that the 
infinite loop regression only occurs after doing the init followed by a 
deinit.  Thus renaming the test accordingly.

--
best regards
--peter;