Re: [RFC 1/2] submodules: test for fetch of non-init subsub-repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09.11.20 18:52, Junio C Hamano wrote:
Peter Kaestle <peter.kaestle@xxxxxxxxx> writes:

This test case triggers a regression, which was introduced by
a62387b3fc9f5aeeb04a2db278121d33a9caafa7 in following setup:

Minor nit.  Please refer to a commit like so:

a62387b3 (submodule.c: fetch in submodules git directory instead of in worktree, 2018-11-28)

That is what "git show -s --pretty=reference" gives for the commit.

If you have older git, "--format='%h (%s, %ad)' --date=short" would
work.

Thanks for this hint, this is really useful.


Instead of saying "if you follow this complex thing, it breaks and
it is a regression at there", please describe it as a regular bugfix
log message.  Describe the set-up first, explain the operation you'd
perform under the condition, and tell readers what your expected
outcome is.  Then tell readers what actually happens, and how that
is different from your expected outcome.  Additionally, tell readers
that it used to work before such and such commit broke it and what
the root cause of the breakage is.

hm... I did do this in the cover letter, maybe you missed it or I was not able to express myself good enough there.
Anyhow, I'll add it to the commit messages, which goes into the log.

Here is my proposal for a new commit message of the test case:

----8<----
A regression has been introduced by 'a62387b (submodule.c: fetch in submodules git directory instead of in worktree, 2018-11-28)'.

The scenario in which it triggers is when one has a remote repository with a subrepository inside a subrepository like this:
superproject/middle_repo/inner_repo

Person A and B have both a clone of it, while Person B is not working
with the inner_repo and thus does not have it initialized in his working
copy.

Now person A introduces a change to the inner_repo and propagates it through the middle_repo and the superproject. Once person A pushed the changes and person B wants to fetch them using "git fetch" on superproject level, git will return with error saying:

Could not access submodule 'inner_repo'
Errors during submodule fetch:
        middle_repo

Expectation is that in this case the inner submodule will be recognized as uninitialized subrepository and skipped by the git fetch command.

This used to work correctly before 'a62387b (submodule.c: fetch in submodules git directory instead of in worktree, 2018-11-28)'.

Starting with a62387b the code wants to evaluate "is_empty_dir()" inside .git/modules for a directory only existing in the worktree, delivering then of course wrong return value.
---->8----


About the revert of the a62387b commit, which I proposed in the second patch, I'm not sure it's the right way. The revert was simply my quick approach to fix it. As I'm not fully aware of what the idea was behind handling the submodules inside .git/modules instead of the worktree, I don't know whether this is the best solution. Maybe rethinking the whole get_next_submodule() algorithm or simply fixing the is_empty_dir() to use the worktree path will be a better solution.
--> We should discuss about this.


What commit the set-up was broken is also an interesting piece of
information, but it is not as important in the overall picture.

Also, it probably is a better arrangement, after explaining how the
current system does not work in the log message, to have the code
fix in the same patch and add test to ensure the bug will stay
fixed, in a single patch.  That way, you do not have to start with
expect_failure and then flip the polarity to expect_success, which
is a horrible style for reviewers to understand the code fix because
the second "fix" step does not actually show the effect of what got
fixed in the patch (the test change shows the flip of the polarity
of the test plus only a few context lines and does not show what
behaviour change the "fix" causes).

Ok, will deliver the test and the fix proposal in a single patch.


diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
index dd8e423..9fbd481 100755
--- a/t/t5526-fetch-submodules.sh
+++ b/t/t5526-fetch-submodules.sh
@@ -719,4 +719,42 @@ test_expect_success 'fetch new submodule commit intermittently referenced by sup
  	)
  '
+add_commit_push()
+{

Style.

     add_commit_push () {

ok.


cf. Documentation/CodingGuidelines.

+	dir="$1"
+	msg="$2"
+	shift 2
+	git -C "$dir" add "$@" &&
+	git -C "$dir" commit -a -m "$msg" &&
+	git -C "$dir" push
+}
+
+test_expect_failure 'fetching a superproject containing an uninitialized sub/sub project' '
+	# does not depend on any previous test setups
+
+	for repo in outer middle inner
+	do
+		git init --bare $repo &&
+		git clone $repo ${repo}_content &&
+		echo $repo > ${repo}_content/file &&

Style.

     echo "$repo" >"${repo}_content/file" &&

ok.


cf. Documentation/CodingGuidelines.

+		add_commit_push ${repo}_content "initial" file

If any of these iterations, except for the last one, fails in the
loop, you do not notice the breakage and go on to the next
iteration.  You'd need "|| return 1" at the end, perhaps.

yes, I definitely missed that.


So far, you created three bare repositories called outer, middle and
inner, and each of {outer,middle,inner}_content repositories is a
copy with a working tree of its counterpart.

+	done &&
+
+	git clone outer A &&
+	git -C A submodule add "$pwd/middle" &&
+	git -C A/middle/ submodule add "$pwd/inner" &&

Hmph.  Is it essential to name these directories with full pathname
for the problem to reproduce, or would the issue also appear if
these repositories refer to each other with relative pathnames?
Just being curious---if it only breaks with one and succeeds with
the other, that deserves commenting here.

Haven't tried that as the case was intended to simulate an environment, where one has remote repositories. And with remote repositories, you have an url, which is kind of absolute path. When reading the failing code, I doubt that it really matters.


So far, you created A that is "outer", added "middle" as its
submodule and then added "inner" as a submodule of "middle".

Although it is not wrong per-se, it somehow feels a bit unnatural
that you didn't do all of the above in the working trees you created
in the previous step---I would have expected that middle_content
working tree would be used to add "inner" as its submodule, for
example.

Not sure I got your concern, maybe it helps you to understand when I add this scenario description which we want to mimic: The "bare" repos outer, middle and inner are created by an administrator on a remote server. Person A is preparing the split of the sources for all the other users working in the environment by adding the submodules the way which is specified by the software architecture we intend to develop in.


+	add_commit_push A/middle/ "adding inner sub" .gitmodules inner &&
+	add_commit_push A/ "adding middle sub" .gitmodules middle &&

And then you conclude the addition of submodules by recording each
of these two "submodule add" events in a commit and push it out.

+	git clone outer B &&
+	git -C B/ submodule update --init middle &&

And then you clone the outer thing (which does not recursively
instantiate) from A, and instantiate the middle layer (which does
not recursively instantiate the bottom later, I presume?)

Yes, Person B is cloning into the outer layer without recursively going into all the submodules, just initializing the ones, which he is expected to work on. In the tests scenario he's only working on the middle layer, but not on the inner one.


I _think_ the state here should be minimally validated in this test.

Of course we could do so. My intention was to keep it focused on the one thing which we needed to test. Namely the fetch of an outer repo with an uninitialized sub-sub repo.


If you expect 'outer' and 'middle' are instantiated, perhaps check
its contents (e.g. do you have a thing called 'file'?  What does it
have in it?) and check the commit (e.g. does 'rev-parse HEAD' give
you the commit you expect?).  If you expect 'inner' is not
instantiated at this point, that should be vaildated as well.  If
anything, that would explain what your expectations are better than
any word in the proposed log message.

In any case, i presume that up to this point things work as expected
with or without the "fix" patch?  If so, the usual way we structure
these tests is to stop here and make that a single "setup" test.
Start the whole sequence above like so, perhaps.

     test_expect_success 'setup nested submodule fetch test' '
		...

Ok, got it, will refactor.


And then the "interesting" part of the test.

+	echo "change on inner repo of A" > A/middle/inner/file &&

Style.

ok.


+	add_commit_push A/middle/inner "change on inner" file &&
+	add_commit_push A/middle "change on inner" inner &&
+	add_commit_push A "change on inner" middle &&

So you create a new commit in the bottom layer, propagate it up to
the middle layer, and to the outer layer.  Are these steps also what
you expect to succeed, or does the "regression" break any of these?
If these are still part of set-up that is expected to work, you
probably need to roll these up to the 'setup' step (with some
validation to express what the tests are expecting). From your
description, which did not say where exactly in this long sequence
you expect things to break, unfortunately no reader can tell, so
I'll leave the restructuring up to you.

Yes those steps are also expected to succeed, it's just important that the initial clone of B happens before those pushes. For your proposed restructuring this could also go into the setup step. Leaving only one single command for the actual test to fail:


+
+	git -C B/ fetch

And from B that was an original copy of A with only the top and
middle layer instantiated, you run "git fetch".  Are you happy as
long as "git fetch" does not exit with non-zero status?  That is
hard to believe---it may be a necessary condition for the command to
exit with zero status, but you have other expectations, like what
commit the remote tracking branch refs/remotes/origin/HEAD ought to
be pointing at.  I think we should check that, too.

Checking for return code is the one thing which catches this regression, but checking whether all the repositories are at the correct HEAD is another thing which we probably want to have in for testing future changes on the respective part of the code. Will add it.

Thank you very much for all the comments, I learned a lot by processing through them. I'll send a patch v2 soon.

--
kind regards,
--peter;



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux