Re: test-lib.sh musings: test_expect_failure considered harmful

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 14 Oct 2021 19:11:17 +0200

On Wed, Oct 13 2021, Junio C Hamano wrote:

> Derrick Stolee <stolee@xxxxxxxxx> writes:
>
>>> But even with the shortcomings of expect_failure, it still is much
>>> better than claiming that we expect a bogus outcome.
>>> 
>>> Improving the shortcomings of expect_failure would be a much better
>>> use of our time than advocating an abuse of expect_sucess, I would
>>> think.
>>
>> I agree that test_expect_failure has these drawbacks. I've recently
>> been using _expect_success to document "bad" behavior so we can verify
>> that behavior changes when that behavior is fixed. But it does have
>> the drawback of looking like we claim the result is by design.
>
> Yeah, I think I saw (and I think I used the same technique myself)
> people expect a bad output with test_expect_success with an in-code
> (not in-log) comment that explicitly says "This documents the
> current behaviour, which is wrong", and that is a very acceptable
> solution, I would think.
>
>> One possible way to correct this is to create a "test_expected_failure"
>> helper that could be placed on the step(s) of the &&-chain that are
>> expected to fail. The helper could set some variable to true if the
>> failure is hit, and false otherwise. It can also convert a failure
>> into a positive result. Then, test_expect_failure could look for that
>> variable's value (after verifying that the &&-chain returns success)
>> to show that all expected failures completed correctly.
>
> Yup, I would very much like the direction, and further imagine that
> the above approach can be extended to ...
>
>> This could have the side-effect of having a "fixed" test_expect_failure
>> show as a failed test, not a "TODO" message.
>
> ... avoid such downside.  Perhaps call that magic "we know this step
> fails currently" test_known_breakage and declare that we deprecate
> the use of test_expect_failure in new tests.  Such a test might look
> like this:
>
>     test_expect_success 'commit error message should not duplicate' '
> 	test_when_finished "chmod -R u+rwx ." &&
> 	chmod u-rwx .git/objects/ &&
> 	orig_head=$(git rev-parse HEAD) &&
> 	test_must_fail git commit --allow-empty -m "read-only" 2>rawerr &&
> 	grep "insufficient permission" rawerr >err &&
> 	test_known_breakage test_line_count = 1 err &&
> 	new_head=$(git rev-parse HEAD) &&
> 	test "$orig_head" = "$new_head"
>     '
>
> which may use your trick to turn both failure and success to OK (to
> let the remainder of the test to continue) but signal the
> surrounding test_expect_success to say either "TODO know breakage"
> or "Fixed".

I don't see how it's a downside. Considering the behavior bad now
shouldn't entail that we should be fuzzy about testing what exactly *is*
happening right now. If one bad but expected state turns into another
unexpected bad state the test should fail.

In this case this thread spawned off a fix where we print an error
twice, instead of once:
https://lore.kernel.org/git/cover-v2-0.2-00000000000-20211012T142950Z-avarab@xxxxxxxxx/#t

That sucks a bit, but printing pages full of such errors in a loop would
be way worse, which we'll hide if we insist on not testing the exact
emitted output, or on such "test_known_breakage" helpers.

It's just a downside because when we fix bugs we'll need to go through
the "expected failure" tests and adjust them, but that seems like a
feature to me.

Now I can submit a patch that fixes a known bug with no test suite
changes, and I might not even notice that I fixed one.

We may want to have tests for something that really is nondeterministic,
e.g. for the code I added in 2d3c02f5db6 (die(): stop hiding errors due
to overzealous recursion guard, 2017-06-21).

But that'll only be the case for some tiny minority (or none) of the
existing callers of "test_expect_failure".

1. https://lore.kernel.org/git/87tuhmk19c.fsf@xxxxxxxxxxxxxxxxxxx/