Re: [PATCH 2/2] diff tests: rewrite flakyness-causing test "aid"

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 14 Apr 2021 12:08:23 +0200

On Wed, Apr 14 2021, Junio C Hamano wrote:

["tl;dr" below]:

> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>
>>>> Thus if we do:
>>>>
>>>>     git something >expected &&
>>>>     test_cmp expected actual &&
>>>>     rm expected actual
>>>
>>> Isn't it a poor example to use to argue for your particular change,
>>> where $actual in the original is designed to be unique among tests,
>>> in order to ensure that $actual files left after test pieces fail
>>> would not interfere with the tests that come later?  IOW, there is
>>> not a reason to remove $actual until the end of the test sequence,
>>> is there?
>>
>> Not really, because you needed to read the rest of the test file to come
>> to that conclusion.
>>
>> The point of using a helper that guarantees cleanup such as
>> test_when_finished or test_config over manual "git config" or "git rm"
>> isn't that we can prove that we need it because a later test needs the
>> cleanup, but that anyone can add new tests or functionality without
>> having to worry about cleaning up the existing trash directory.
>>
>> So yes, it's not needed here, but that's only because we know the rest
>> of the tests don't have e.g. a test that does a:
>
> In this particular case, $actual files are designed to be left
> behind for failed test pieces, so that the tester can come back and
> inspect them.  I probably should have said it a bit more strongly
> than "there is not a reason to remove".  You SHOULD NOT remove and
> that is why we had "check and then remove only upon success" there,
> instead of test_when_finished.  We want them left for and only for
> failing test pieces.

Yes, it's clear that it's designed to do that. I'm not disagreeing on
the intent of your commit in 2006 to set it up this way.

What I am saying is that it's incompatible to have:

 1. Failing tests
 2. Not removing scratch files that would otherwise be removed

And:

 3. Knowing that the rest of the tests pass in the case of #1 without
    reading them all.

Hence the suggestion that we should use test_when_finished without
exception for such patterns.

Because consistently using the helper would allow us to smartly get #1
and #2 without #3, i.e. some "copy them for later analysis" as suggested
upthread in:
http://lore.kernel.org/git/874kg92xn0.fsf@xxxxxxxxxxxxxxxxxxx

> Please do not advocate for and encourage newbies who would be
> reading the discussion from sidelines to use test_when_finished out
> of dogmatic principle without thinking.

Is Matheus the newbie here? I think he's contributed enough to form his
own opinion.

I think it's clear to anyone else reading from the sidelines that we're
having some informed disagreement about a finer point of patterns in the
test suite.

I doubt either one of us is likely to have much of an impact on newbies
reading from the sidelines.

In any case, I don't see how you're able to read this thread and come to
the conclusion that I'm proposing the use of test_when_finished out of
"[some] dogmatic principle".

I'm not. I'm proposing to use it because I think it makes sense, and I
think the reasons you've noted for avoiding it in this case have more
downsides than upsites, as noted in the #1-3 examples above.

> Even though there are valid cases where test_when_finished is the
> perfect fit, in this particular case, use of it is a clear regression.

Well, I disagree with you.

I think even if the patch I was proposing was a mere conversion to
test_when_finished as would be typical for most newly written tests,
i.e. a diff like this:

	diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
	index 6cca8b84a6b..d0031aa0f7b 100755
	--- a/t/t4013-diff-various.sh
	+++ b/t/t4013-diff-various.sh
	@@ -204,6 +204,7 @@ do
	 	actual="$pfx-diff.$test"

	 	test_expect_$status "git $cmd # magic is ${magic:-(not used)}" '
	+		test_when_finished "rm \"$actual\"" &&
	 		{
	 			echo "$ git $cmd"
	 			case "$magic" in
	@@ -216,22 +217,11 @@ do
	 			    -e "s/^\\(.*mixed; boundary=\"-*\\)$V\\(-*\\)\"\$/\\1g-i-t--v-e-r-s-i-o-n\2\"/"
	 			echo "\$"
	 		} >"$actual" &&
	-		if test -f "$expect"
	-		then
	-			process_diffs "$actual" >actual &&
	-			process_diffs "$expect" >expect &&
	-			case $cmd in
	-			*format-patch* | *-stat*)
	-				test_cmp expect actual;;
	-			*)
	-				test_cmp expect actual;;
	-			esac &&
	-			rm -f "$actual" actual expect
	-		else
	-			# this is to help developing new tests.
	-			cp "$actual" "$expect"
	-			false
	-		fi
	+		test_when_finished "rm actual" &&
	+		process_diffs "$actual" >actual &&
	+		test_when_finished "rm expect" &&
	+		process_diffs "$expect" >expect &&
	+		test_cmp expect actual
	 	'
	 done <<\EOF
	 diff-tree initial

It's a clear improvement, because, on current master, emulating the case
when you add a new test-case:

    $ rm t4013/diff.log_-GF_-p_--pickaxe-all_master

Now run it once:

    $ ./t4013-diff-various.sh
    [...]
    # failed 1 among 199 test(s)

And then to debug it:

    $ ./t4013-diff-various.sh -vixd
    [...]
    # passed all 199 test(s)

So because we copied the file around the only get the failure the first
time around.

You might rightly say that we could have narrowly just fixed that
particular bug and kept the old "copy and return false", so something
like:

	diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
	index 6cca8b84a6b..6f0f3c7f53c 100755
	--- a/t/t4013-diff-various.sh
	+++ b/t/t4013-diff-various.sh
	@@ -229,7 +229,7 @@ do
	 			rm -f "$actual" actual expect
	 		else
	 			# this is to help developing new tests.
	-			cp "$actual" "$expect"
	+			cp "$actual" "$expect.MAYBE-USE-THIS"
	 			false
	 		fi
	 	'

But as noted above that gets us into needing to worry about cascading
failure. I.e. any new test added later in the file needs to think about
"what if the t4013 directory is in this state?".

We also document what we'll do and how we'll do it in t/README in this
case:

	--immediate::
		This causes the test to immediately exit upon the first
		failed test. Cleanup commands requested with
		test_when_finished are not executed if the test failed,
		in order to keep the state for inspection by the tester
		to diagnose the bug.

So it's much more obvious and consistent to have a failure that ends on
the test_cmp and being able to see the test_when_finished for the
intermediate files in the trace output.

The hypothetical newbie might have debugged such a pattern already, but
not having it end in a special-case used only for this test:

    [...]
    + test -f /home/avar/g/git/t/t4013/diff.log_-GF_-p_--pickaxe-all_master
    + cp 0096-diff.log_-GF_-p_--pickaxe-all_master /home/avar/g/git/t/t4013/diff.log_-GF_-p_--pickaxe-all_master.MAYBE-USE-THIS
    + false

In any case, neither one of those is the patch I'm suggesting upthread.

I am suggesting not just to use test_when_finished, but that we should
use BUG here. Since not having the file itself is a bug in git's test
suite, not a mere failure of the test case. The test will never pass if
the file doesn't exist.

So even if I agreed with you that the more narrow migration to
test_when_finished would be a regression, I don't see how you think
*this* patch could be a regression.

You seem to just be focusing on the test_when_finished part of it, but
that's not all it's doing. Since we also use BUG the output is now:

    $ ./t4013-diff-various.sh 
    [...]
    error: bug in the test script: Have no "t4013/diff.log_-GF_-p_--pickaxe-all_master", new test? [...]

I.e. whether we used "rm" or "cp" or "test_when_finished 'rm [...]'"
earlier has become irrelevant to achieve the states aims in 3c2f75b590c
(t4013: add tests for diff/log family output options., 2006-06-26).

It doesn't matter how we're removing/copying etc. the file anymore,
that's no longer how we're helping the user debug the test.

tl;dr:

In any case, I think it's best to just drop this series. I wrote the
above more as a synopsis of what I belive we should be doing in general,
I do have some patches queued up to address the general debug-ability of
our test output (e.g. for CI).

For that end-goal having e.g. test_when_finished (mostly) consistently
used would help a lot, but not using it in any one test would be
fine. This is just one such case, and something I thought wouldn't be a
controversial patch.

I think at this point me trying to gleaming entrails of what you want &
you responding back is going to take both of us much more time than
replacing this series with a patch you'd be happy with, and this wasn't
something I cared about more than a one-off.