Re: [RFC: kdevops] Standardizing on failure rate nomenclature for expunges

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Thu, 7 Jul 2022 14:16:13 -0700

On Sun, Jul 03, 2022 at 07:54:11AM -0700, Bart Van Assche wrote:
> On 7/3/22 06:32, Theodore Ts'o wrote:
> > On Sat, Jul 02, 2022 at 02:48:12PM -0700, Bart Van Assche wrote:
> > > 
> > > I strongly disagree with annotating tests with failure rates. My opinion is
> > > that on a given test setup a test either should pass 100% of the time or
> > > fail 100% of the time.
> > 
> > My opinion is also that no child should ever go to bed hungry, and we
> > should end world hunger.
> 
> In my view the above comment is unfair. The first year after I wrote the
> SRP tests in blktests I submitted multiple fixes for kernel bugs encountered
> by running these tests. Although it took a significant effort, after about
> one year the test itself and the kernel code it triggered finally resulted
> in reliable operation of the test. After that initial stabilization period
> these tests uncovered regressions in many kernel development cycles, even in
> the v5.19-rc cycle.
> 
> Since I'm not very familiar with xfstests I do not know what makes the
> stress tests in this test suite fail. Would it be useful to modify the code
> that decides the test outcome to remove the flakiness, e.g. by only checking
> that the stress tests do not trigger any unwanted behavior, e.g. kernel
> warnings or filesystem inconsistencies?

Filesystems and the block layer are bundled on top of tons of things in
the kernel, and those layers could introduce the undeterminism. To rule
out determinism we must first rule out undeterminism in other areas of
the kernel, and that will take a long time. Things like kunit tests will
help here, along with adding more tests to other smaller layers. The
list is long.

At LSFMM I mentioned how blktests block/009 had an odd failure rate of
about 1/669 a while ago. The issue was real, and it took a while to
figure out what the real issue was. Jan Kara's patches solved these
issues and they are not trivial to backport to ancient enterprise
kernels ;)

Another more recent one was the undeterministic RCU cpu stall warnings with
a failure rate of about 1/80 on zbd/006 and that lead to some interesting
revelations about how qemu's use of discard was shitty and just needed
to be enhanced.

Yes, you can probably make zbd/006 more atomic and split it into 10
tests, but I don't think we can escape the lack of determinism in
certain areas of the kernel. We can *work to improve* it, but again,
that will take time, and I am not quite sure many folks really want
that too.

  Luis