On Sun, Jul 03, 2022 at 07:54:11AM -0700, Bart Van Assche wrote: > On 7/3/22 06:32, Theodore Ts'o wrote: > > On Sat, Jul 02, 2022 at 02:48:12PM -0700, Bart Van Assche wrote: > > > > > > I strongly disagree with annotating tests with failure rates. My opinion is > > > that on a given test setup a test either should pass 100% of the time or > > > fail 100% of the time. > > > > My opinion is also that no child should ever go to bed hungry, and we > > should end world hunger. > > In my view the above comment is unfair. The first year after I wrote the > SRP tests in blktests I submitted multiple fixes for kernel bugs encountered > by running these tests. Although it took a significant effort, after about > one year the test itself and the kernel code it triggered finally resulted > in reliable operation of the test. After that initial stabilization period > these tests uncovered regressions in many kernel development cycles, even in > the v5.19-rc cycle. > > Since I'm not very familiar with xfstests I do not know what makes the > stress tests in this test suite fail. Would it be useful to modify the code > that decides the test outcome to remove the flakiness, e.g. by only checking > that the stress tests do not trigger any unwanted behavior, e.g. kernel > warnings or filesystem inconsistencies? Filesystems and the block layer are bundled on top of tons of things in the kernel, and those layers could introduce the undeterminism. To rule out determinism we must first rule out undeterminism in other areas of the kernel, and that will take a long time. Things like kunit tests will help here, along with adding more tests to other smaller layers. The list is long. At LSFMM I mentioned how blktests block/009 had an odd failure rate of about 1/669 a while ago. The issue was real, and it took a while to figure out what the real issue was. Jan Kara's patches solved these issues and they are not trivial to backport to ancient enterprise kernels ;) Another more recent one was the undeterministic RCU cpu stall warnings with a failure rate of about 1/80 on zbd/006 and that lead to some interesting revelations about how qemu's use of discard was shitty and just needed to be enhanced. Yes, you can probably make zbd/006 more atomic and split it into 10 tests, but I don't think we can escape the lack of determinism in certain areas of the kernel. We can *work to improve* it, but again, that will take time, and I am not quite sure many folks really want that too. Luis