Re: test-lib.sh musings: test_expect_failure considered harmful

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 13 Oct 2021 12:10:43 +0200

On Tue, Oct 12 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>
>> On Mon, Oct 11 2021, Junio C Hamano wrote:
> [...]
>> Presumably with test_expect_failure.
>>
>> I'll change it, in this case we'd end up with a test_expect_success at
>> the end, so it doesn't matter much & I don't care.
>
> I do agree with you that compared to expect_success, which requires
> _all_ steps to succeed, so an failure in any of its steps is
> immediately noticeable, it is harder to write and keep
> expect_failure useful, because it is not like we are happy to see
> any failure in any step.  We do not expect a failure in many
> preparation and conclusion steps in the &&-chain in expect_failure
> block, and we consider it is an error if these steps fail.  We only
> want to mark only a single step to exhibit an expected but undesirable
> behaviour.
>
> But even with the shortcomings of expect_failure, it still is much
> better than claiming that we expect a bogus outcome.
>
> Improving the shortcomings of expect_failure would be a much better
> use of our time than advocating an abuse of expect_sucess, I would
> think.

I'd like to improve it, but I'll have to get any patch in this are past
you :)

My reading of your opinion from past exchanges is that you find it
objectionable to say "this is a success" when it's not the /desired/
behavior, whereas I think it's valuable to just test for and document
the exact existing behavior, even if it's not desirable. So you don't
really need a function different from test_expect_success, just a
comment saying "this should change", or add a ("non-hash so it's not TAP
syntax") "TODO" to the description of the test.

But if you agree that we shouldn't conflate failures in the different
steps I think we're getting somewhere, so to begin with what do you
think about the hack in the v2 of my series?
https://lore.kernel.org/git/cover-v2-0.2-00000000000-20211012T142950Z-avarab@xxxxxxxxx/

If we were to prompote those semantics to something that
test_expect_failure would use it would be the below, which I think is
the only sensible way to use it.

But that would mean changing all existing test_expect_failure uses in
the test suite, so it would need either a pretty large patch, or some
incremental steps to get there:

But it will mean we can't use it for any test that's actually flaky, so
we'll need a test_expect_flaky, or have some test-specific workarounds
in those areas.

diff --git a/t/t7815-grep-binary.sh b/t/t7815-grep-binary.sh
index 90ebb64f46e..9a95c9e7d69 100755
--- a/t/t7815-grep-binary.sh
+++ b/t/t7815-grep-binary.sh
@@ -64,7 +64,7 @@ test_expect_success 'git grep ile a' '
 '
 
 test_expect_failure 'git grep .fi a' '
-	git grep .fi a
+	test_must_fail git grep .fi a
 '
 
 test_expect_success 'grep respects binary diff attribute' '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 8361b5c1c57..6d9291b7ead 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -728,8 +728,8 @@ test_known_broken_ok_ () {
 	then
 		write_junit_xml_testcase "$* (breakage fixed)"
 	fi
-	test_fixed=$(($test_fixed+1))
-	say_color error "ok $test_count - $@ # TODO known breakage vanished"
+	test_broken=$(($test_broken+1))
+	say_color warn "not ok $test_count - $@ # TODO known breakage"
 }
 
 test_known_broken_failure_ () {
@@ -737,8 +737,8 @@ test_known_broken_failure_ () {
 	then
 		write_junit_xml_testcase "$* (known breakage)"
 	fi
-	test_broken=$(($test_broken+1))
-	say_color warn "not ok $test_count - $@ # TODO known breakage"
+	test_fixed=$(($test_fixed+1))
+	say_color error "not ok $test_count - $@ # TODO a 'known breakage' changed behavior!"
 }
 
 test_debug () {