test-lib.sh musings: test_expect_failure considered harmful

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 12 Oct 2021 11:23:40 +0200

On Mon, Oct 11 2021, Junio C Hamano wrote:

[Removed "In-reply-to: <xmqq5yu3b80j.fsf@gitster.g>" with the Subject
change]

> Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:
>>  test_expect_success POSIXPERM,SANITY 'commit should notice unwritable repository' '
>>  	test_when_finished "chmod 775 .git/objects .git/objects/??" &&
>>  	chmod a-w .git/objects .git/objects/?? &&
>> -	test_must_fail git commit -m second
>> +
>> +	cat >expect <<-\EOF &&
>> +	error: insufficient permission for adding an object to repository database .git/objects
>> +	error: insufficient permission for adding an object to repository database .git/objects
>> +	error: Error building trees
>> +	EOF
>
> This is odd.  Shouldn't the test expect one message from write-tree
> and be marked as expecting a failure until the bug gets fixed?

Presumably with test_expect_failure.

I'll change it, in this case we'd end up with a test_expect_success at
the end, so it doesn't matter much & I don't care.

But FWIW $subject, or at least s/harmful/running with scissors/g :)

[CC'd some recent-ish users of test_expect_failure, and I'm no innocent
in that department :)]

In the Perl world (Test::More et al) the "#TODO" keyword we map
test_expect_failure to (and yeah, I know the latter pre-dates the
former...) doesn't generally lead to subtle breakages and mismatched
expectations, i.e. you do:

    TODO: {
        local $TODO = "not implemented yet";
        is($a, $b, "this is why this in particular fails");
    }

So you generally mark the *specific* thing that fails, as separate from
your test setup itself.

But our test-lib.sh API for it is the equivalent of marking an entire
logical test block and its setup as a TODO.

So the diff below "passes". But did we intend for the test_cmp to fail,
for the thing to segfault or hit a BUG?

Any of those conditions being hit will have the TODO test pass. So will
all of it succeeding.

=== snip ===

diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index df544bb321f..15724e6a358 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -601,4 +601,13 @@ test_expect_success 'branch -m with the initial branch' '
 	test again = $(git -C rename-initial symbolic-ref --short HEAD)
 '
 
+test_expect_failure 'do stuff' '
+	git config alias.fake-SEGV "!f() { echo Fake SEGV; exit 139; }; f" &&
+	git config alias.fake-BUG "!f() { echo Fake BUG; exit 99; }; f" &&
+
+	git fake-BUG >expect &&
+	git fake-SEGV >actual &&
+	test_cmp expect actual
+'
+
 test_done

=== snip ===

(Although for the "suceeding" case we'll print out a summary from
"prove", but unless you're carefully eyeballing that...).

So I think "test_expect_failure" should be avoided, the only useful way
of holding it which works in combination with other test-lib.sh features
that I've come up with is:

	test_expect_success 'setup flaky failure' '
		[multi-line test code that passes here] &&
		>setup-todo
	'

	if test -e setup-todo
	then
		test_expect_failure 'flaky failure due to XYZ' '
			test_cmp todo.expect todo.actual
		'
	fi

I.e.:

 * Don't say that the failure of your passing test setup is OK too.
 * In doing that don't break --run=N, so that "test -e setup-todo" test
   (or equivalent) is needed, in case the "setup" is skipped.
 * Have *just* the "test_cmp" (or other specific failure test) in the
   "test_expect_failure"

But it's only useful if you can't make that a "! test_cmp" (or rather, a
more specific positive & passing "test_cmp".

I.e. it's flaky, or the output/end state is otherwise unknown (but we
expect it to be once bugs are fixed).

We have ~150 uses of test_expect_failure in the test suite, I'm pretty
sure that <20 of them at most are "correct" under the above
criteria. E.g. this is ok-ish:
    
    t7815-grep-binary.sh-# This test actually passes on platforms where regexec() supports the
    t7815-grep-binary.sh-# flag REG_STARTEND.
    t7815-grep-binary.sh-test_expect_success 'git grep ile a' '
    t7815-grep-binary.sh-   git grep ile a
    t7815-grep-binary.sh-'

Although in that case we should make it a test_expect_success if we can
get a "REG_STARTEND" build flag exported to the test suite. Skimming the
grep hits *maybe* some of the ones in "t9602-cvsimport-branches-tags.sh"
(I haven't looked carefully).

But most of them I Consider Harmful, i.e. they're a bunch of setup code
that could be hiding an unexpected bug/segfault. Running into that with
some past WIP work (a thing I considered to "just fail a test_cmp"
started segfaulting) is why I try to avoid it.