Re: [PATCH nft 1/3] tests/shell: skip "table_onoff" test if kernel patch is missing

On Tue, 2023-10-17 at 11:32 +0200, Florian Westphal wrote:
> Thomas Haller <thaller@xxxxxxxxxx> wrote:
> > As you probably run a self-built kernel, wouldn't you just `export
> > NFT_TEST_FAIL_ON_SKIP=y` and reject all skips as failures? What's
> > the
> > problem with that? That exists exactly for your use case.
> No, its not my use case.
> The use case is to make sure that the frankenkernels that I am in
> charge
> of do not miss any important bug fixes.
> This is the reason for the feature probing, "skip" should tell me
> that
> I can safely ignore it because the feature is not present.
> I could built a list of "expected failures", but that will mask real
> actual regressions.

How did you handle that, before the recent addition of the skip
functionality? Did you just have a list of known failures, and manually
ignored them?

Anyway, the "eval-exit-code" in v2 can easily honor an environment
variable, to always fail hard. The only question is how exactly it
should work.

I propse that NFT_TEST_FAIL_ON_SKIP=y should honor a variable
"NFT_TEST_FAIL_ON_SKIP_EXCEPT=", which takes a regex of test names, for
which a skip is *not* fatal (you opt-in the tests that are allowed to
fail). If you maintain c9s, the list of known skipped tests is small
and relatively static. You can maintain a per-kernel-variant regex in
that case.

If we want, we can even parse /etc/os-release and uname and code a
default list of regexes.

> > > This is a bug, and it tells me that I might have to do something
> > > about it.
> > 
> > OK, do you intend to fix this bug in a very timely manner on Fedora
> > 38
> > (and other popular kernels)? Then maybe hold back the test until
> > that
> > happend? (or let it skip for now, and in a few weeks, upgrade to
> > hard
> > failure -- the only problem is not to forget about that).
> I did keep the test back until I saw that -stable had picked it up.
> I can wait longer, sure.

I think it is good to merge tests soon. There just needs to be a
reasonable+convenient way to handle the problem.

Having a policy that requires you to wait is broken. Especially, since
it's unclear how long to wait. You are not waiting for yourself, but
for any unknown user who is affected.

> > Ah right. "tests/shell/testcases/transactions/table_onoff" is fixed
> > on
> > 6.5.6-200.fc38.x86_64. There still is a general problem. For
> > example
> > what about tests/shell/testcases/packetpath/vlan_8021ad_tag ?
> Its also a bug that needs to be fixed in the kernel.
> I applied it after stable had picked it up for 6.5.7.
> > 1) the test would exit 78 instead of 77. And would
> > treat 78
> > either as failure or as skip, based on NFT_TEST_FAIL_ON_SKIP
> > 
> > 2) the test itself could look at NFT_TEST_FAIL_ON_SKIP and decide
> > whether to exit with 77 or 1.
> > 
> > 
> > Or how about adding a mechanism, that compares the kernel version
> > and
> > decides whether to skip? For example
> I don't think that kernel versions work or are something that we can
> realistically handle.  Even just RHEL would be a nightmare if one
> considers all the different release streams.
> I think even just handling upstream -stable is too much work.

I think the kernel versions work reasonably well for upstream and
Fedora kernels (which is something already!).

I guess, there could be a smarter

  "$NFT_TEST_BASEDIR/helpers/eval-exit-code" kernel  upstream-6.6  upstream-6.5.6  c9s-5.14.0-373

that also can cover different "streams" (e.g. the uname from a centos).
But I like a NFT_TEST_FAIL_ON_SKIP_EXCEPT= better.

Also, at worst on the Frankenkernel you get a SKIP, when it should have
been a FAIL. For the non-expert user who writes a patch to fix a type
the SKIP is better during `make check`.

On upstream/Fedora kernels, you also don't need anything, and "eval-
exit-code" will end up doing the right thing automatically.

And if you maintain CentOS9Stream, then set NFT_TEST_FAIL_ON_SKIP=y and
NFT_TEST_FAIL_ON_SKIP_EXCEPT=<REGEX> and keep track of the tests that
are known to fail. You know your kernel, and the tests that are known
to be skipped.

How about that?

> That said, I hope that these kinds of tests will happen less
> frequently
> over time.

I like the optimism :)


