Re: [RFC: kdevops] Standardizing on failure rate nomenclature for expunges

Zorro Lang <zlang@xxxxxxxxxx> · Thu, 19 May 2022 23:44:19 +0800

On Thu, May 19, 2022 at 03:58:31PM +0100, Matthew Wilcox wrote:
> On Thu, May 19, 2022 at 07:24:50PM +0800, Zorro Lang wrote:
> > Yes, we talked about this, but if I don't rememeber wrong, I recommended each
> > downstream testers maintain their own "testing data/config", likes exclude
> > list, failed ratio, known failures etc. I think they're not suitable to be
> > fixed in the mainline fstests.
> 
> This assumes a certain level of expertise, which is a barrier to entry.
> 
> For someone who wants to check "Did my patch to filesystem Y that I have
> never touched before break anything?", having non-deterministic tests
> run by default is bad.
> 
> As an example, run xfstests against jfs.  Hundreds of failures, including
> some very scary-looking assertion failures from the page allocator.
> They're (mostly) harmless in fact, just being a memory leak, but it
> makes xfstests useless for this scenario.
> 
> Even for well-maintained filesystems like xfs which is regularly tested,
> I expect generic/270 and a few others to fail.  They just do, and they're
> not an indication that *I* broke anything.
> 
> By all means, we want to keep tests around which have failures, but
> they need to be restricted to people who have a level of expertise and
> interest in fixing long-standing problems, not people who are looking
> for regressions.

It's hard to make sure if a failure is a regression, if someone only run
the test once. The testers need some experience, at least need some
history test data.

If a tester find a case has 10% chance fail on his system, to make sure
it's a regression or not, if he doesn't have history test data, at least
he need to do the same test more times on old kernel version with his
system. If it never fail on old kernel version, but can fail on new kernel.
Then we suspect it's a regression.

Even if the tester isn't an expert of the fs he's testing, he can report
this issue to that fs experts, to get more checking. For downstream kernel,
he has to report to the maintainers of downstream, or check by himself.
If a case pass on upstream, but fail on downstream, it might mean there's
a patchset on upstream can be backported.

So, anyway, the testers need their own "experience" (include testing history
data, known issue, etc) to judge if a failure is a suspected regression, or
a known issue of downstream which hasn't been fixed (by backport).

That's my personal perspective :)

Thanks,
Zorro

>