Re: [RFC: kdevops] Standardizing on failure rate nomenclature for expunges

"Theodore Ts'o" <tytso@xxxxxxx> · Thu, 19 May 2022 10:18:48 -0400

On Thu, May 19, 2022 at 07:24:50PM +0800, Zorro Lang wrote:
> 
> Yes, we talked about this, but if I don't rememeber wrong, I recommended each
> downstream testers maintain their own "testing data/config", likes exclude
> list, failed ratio, known failures etc. I think they're not suitable to be
> fixed in the mainline fstests.

Failure ratios are the sort of thing that are only applicable for

* A specific filesystem
* A specific configuration
* A specific storage device / storage device class
* A specific CPU architecture / CPU speed
* A specific amount of memory available

Put another way, there are problems that fail so close to rarely as to
be "hever" on, say, an x86_64 class server with gobs and gobs of
memory, but which can more reliably fail on, say, a Rasberry PI using
eMMC flash.

I don't think that Luis was suggesting that this kind of failure
annotation would go in upstream fstests.  I suspect he just wants to
use it in kdevops, and hope that other people would use it as well in
other contexts.  But even in the context of test runners like kdevops
and {kvm,gce,android}-xfstests, it's going to be very specific to a
particular test environment, and for the global list of excludes for a
particular file system.  So in the gce-xfstests context, this is the
difference between the excludes in the files:

	fs/ext4/excludes
vs
	fs/ext4/cfg/bigalloc.exclude

even if I only cared about, say, how things ran on GCE using
SSD-backed Persistent Disk (never mind that I can only run
gce-xfstests on Local SSD, and PD Extreme, etc.), failure percentages
would never make sense for fs/ext4/excludes, since that covers
multiple file system configs.  And my infrastructure supports kvm,
gce, and Android, as well as some people (such as at $WORK for our
data center kernels) who run the test appliacce directly on bare
metal, so I wouldn't use the failure percentages in these files, etc.

Now, what I *do* is to track this sort of thing in my own notes, e.g:

generic/051	ext4/adv	Failure percentage: 16% (4/25)
    "Basic log recovery stress test - do lots of stuff, shut down in
    the middle of it and check that recovery runs to completion and
    everything can be successfully removed afterwards."

generic/410 nojournal	Couldn't reproduce after running 25 times
     "Test mount shared subtrees, verify the state transitions..."

generic/68[12]	encrypt   Failure percentage: 100%
    The directory does grow, but blocks aren't charged to either root or
    the non-privileged users' quota.  So this appears to be a real bug.

There is one thing that I'd like to add to upstream fstests, and that
is some kind of option so that "check --retry-failures NN" would cause
fstests to automatically, upon finding a test failure, will rerun that
failing test NN aditional times.  Another potential related feature
which we currently have in our daily spinner infrastructure at $WORK
would be to on a test failure, rerun a test up to M times (typically a
small number, such as 3), and if it passes on a retry attempt, declare
the test result as "flaky", and stop running the retries.  If the test
repeatedly fails after M attempts, then the test result is "fail".

These results would be reported in the junit XML file, and would allow
the test runners to annotate their test summaries appropriately.

I'm thinking about trying to implement something like this in my
copious spare time; but before I do, does the general idea seem
acceptable?

Thanks,

					- Ted