Re: [PATCH 1/3] [RFC] tests: add test_todo() to mark known breakages

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 07 Dec 2022 13:08:25 +0100

On Tue, Dec 06 2022, Victoria Dye wrote:

> Phillip Wood via GitGitGadget wrote:
>> From: Phillip Wood <phillip.wood@xxxxxxxxxxxxx>
>> 
>> test_todo() is intended as a fine grained replacement for
>> test_expect_failure(). Rather than marking the whole test as failing
>> test_todo() is used to mark individual failing commands within a
>> test. This approach to writing failing tests allows us to detect
>> unexpected failures that are hidden by test_expect_failure().
>
> I love this idea! I've nearly been burned a couple of times by the wrong
> line in a 'test_expect_failure' triggering the error (e.g., due to bad
> syntax earlier in the test). The added specificity of 'test_todo' will help
> both reviewers and people fixing the underlying issues demonstrated by
> expected-failing tests.
>
>> 
>> Failing commands are reported by the test harness in the same way as
>> test_expect_failure() so there is no change in output when migrating
>> from test_expect_failure() to test_todo(). If a command marked with
>> test_todo() succeeds then the test will fail. This is designed to make
>> it easier to see when a command starts succeeding in our CI compared
>> to using test_expect_failure() where it is easy to fix a failing test
>> case and not realize it.
>> 
>> test_todo() is built upon test_expect_failure() but accepts commands
>> starting with test_* in addition to git. As our test_* assertions use
>> BUG() to signal usage errors any such error will not be hidden by
>> test_todo().
>
> Should this be so restrictive? I think 'test_todo' would need to handle any
> arbitrary command (mostly because of custom functions like
> 'ensure_not_expanded' in 't1092') to be an easy-to-use drop-in replacement
> for 'test_expect_failure'. 
>
> I see there's some related discussion in another subthread [1], but I don't
> necessarily think removing restrictions (i.e. that the tested command must
> be 'git', 'test_*', etc.) on 'test_todo' requires doing the same for
> 'test_must_fail' et al. to be internally consistent. On one hand,
> 'test_todo' could be interpreted as an assertion (like 'test_must_fail'),
> where we only want to assert on our code - hence the restrictions. From that
> perspective, it would make sense to ease restrictions uniformly on all of
> our assertion helpers. 
>
> On the other hand, I'm interpreting 'test_todo' as
> 'test_expect_failure_on_line_N' - more of a "post-test result interpreter"
> than an assertion helper. So because 'test_expect_failure' doesn't require
> the failing line to come from a particular command, I don't think
> 'test_todo' needs to either. That leaves assertion helpers like
> 'test_must_fail' out of the scope of this change, avoiding any hairiness of
> allowing them to assert on arbitrary code.
>
> What do you think?

Are you saying that for the "test_todo" we shouldn't care whether it
exits with a "normal" non-zero or a segfault, abort() (e.g. BUG()) etc?
That's what the "test_must_fail" v.s. "!" is about.

Even if we erased tat distinction I think such a thing would be a
marginal improvement on "test_expect_failure", as we'd at least mark
what line fails, but like "test_expect_failure" we'd accept segfaults as
failures.

but as noted in the upthread discussions I think we should do better and
still check for segfaults etc. I think we have a couple of
"test_expect_failure" now where we expect a segfault, but for the rest
we'd like to know if they start segfaulting.