Re: [RFC: kdevops] Standardizing on failure rate nomenclature for expunges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/18/22 20:07, Luis Chamberlain wrote:
I've been promoting the idea that running fstests once is nice,
but things get interesting if you try to run fstests multiple
times until a failure is found. It turns out at least kdevops has
found tests which fail with a failure rate of typically 1/2 to
1/30 average failure rate. That is 1/2 means a failure can happen
50% of the time, whereas 1/30 means it takes 30 runs to find the
failure.

I have tried my best to annotate failure rates when I know what
they might be on the test expunge list, as an example:

workflows/fstests/expunges/5.17.0-rc7/xfs/unassigned/xfs_reflink.txt:generic/530 # failure rate about 1/15 https://gist.github.com/mcgrof/4129074db592c170e6bf748aa11d783d

The term "failure rate 1/15" is 16 characters long, so I'd like
to propose to standardize a way to represent this. How about

generic/530 # F:1/15

Then we could extend the definition. F being current estimate, and this
can be just how long it took to find the first failure. A more valuable
figure would be failure rate avarage, so running the test multiple
times, say 10, to see what the failure rate is and then averaging the
failure out. So this could be a more accurate representation. For this
how about:

generic/530 # FA:1/15

This would mean on average there failure rate has been found to be about
1/15, and this was determined based on 10 runs.

We should also go extend check for fstests/blktests to run a test
until a failure is found and report back the number of successes.

Thoughts?

Note: yes failure rates lower than 1/100 do exist but they are rare
creatures. I love them though as my experience shows so far that they
uncover hidden bones in the closet, and they they make take months and
a lot of eyeballs to resolve.

I strongly disagree with annotating tests with failure rates. My opinion is that on a given test setup a test either should pass 100% of the time or fail 100% of the time. If a test passes in one run and fails in another run that either indicates a bug in the test or a bug in the software that is being tested. Examples of behaviors that can cause tests to behave unpredictably are use-after-free bugs and race conditions. How likely it is to trigger such behavior depends on a number of factors. This could even depend on external factors like which network packets are received from other systems. I do not expect that flaky tests have an exact failure rate. Hence my opinion that flaky tests are not useful and also that it is not useful to annotate flaky tests with a failure rate. If a test is flaky I think that the root cause of the flakiness must be determined and fixed.

Bart.



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux