Re: [RFC: kdevops] Standardizing on failure rate nomenclature for expunges

Zorro Lang <zlang@xxxxxxxxxx> · Thu, 19 May 2022 23:10:35 +0800

On Thu, May 19, 2022 at 10:18:48AM -0400, Theodore Ts'o wrote:
> On Thu, May 19, 2022 at 07:24:50PM +0800, Zorro Lang wrote:
> > 
> > Yes, we talked about this, but if I don't rememeber wrong, I recommended each
> > downstream testers maintain their own "testing data/config", likes exclude
> > list, failed ratio, known failures etc. I think they're not suitable to be
> > fixed in the mainline fstests.
> 
> Failure ratios are the sort of thing that are only applicable for
> 
> * A specific filesystem
> * A specific configuration
> * A specific storage device / storage device class
> * A specific CPU architecture / CPU speed
> * A specific amount of memory available

And a specific bug I suppose :)

> 
> Put another way, there are problems that fail so close to rarely as to
> be "hever" on, say, an x86_64 class server with gobs and gobs of
> memory, but which can more reliably fail on, say, a Rasberry PI using
> eMMC flash.
> 
> I don't think that Luis was suggesting that this kind of failure
> annotation would go in upstream fstests.  I suspect he just wants to
> use it in kdevops, and hope that other people would use it as well in
> other contexts.  But even in the context of test runners like kdevops
> and {kvm,gce,android}-xfstests, it's going to be very specific to a
> particular test environment, and for the global list of excludes for a
> particular file system.  So in the gce-xfstests context, this is the
> difference between the excludes in the files:
> 
> 	fs/ext4/excludes
> vs
> 	fs/ext4/cfg/bigalloc.exclude
> 
> even if I only cared about, say, how things ran on GCE using
> SSD-backed Persistent Disk (never mind that I can only run
> gce-xfstests on Local SSD, and PD Extreme, etc.), failure percentages
> would never make sense for fs/ext4/excludes, since that covers
> multiple file system configs.  And my infrastructure supports kvm,
> gce, and Android, as well as some people (such as at $WORK for our
> data center kernels) who run the test appliacce directly on bare
> metal, so I wouldn't use the failure percentages in these files, etc.
> 
> Now, what I *do* is to track this sort of thing in my own notes, e.g:
> 
> generic/051	ext4/adv	Failure percentage: 16% (4/25)
>     "Basic log recovery stress test - do lots of stuff, shut down in
>     the middle of it and check that recovery runs to completion and
>     everything can be successfully removed afterwards."
> 
> generic/410 nojournal	Couldn't reproduce after running 25 times
>      "Test mount shared subtrees, verify the state transitions..."
> 
> generic/68[12]	encrypt   Failure percentage: 100%
>     The directory does grow, but blocks aren't charged to either root or
>     the non-privileged users' quota.  So this appears to be a real bug.
> 
> 
> There is one thing that I'd like to add to upstream fstests, and that
> is some kind of option so that "check --retry-failures NN" would cause
> fstests to automatically, upon finding a test failure, will rerun that
> failing test NN aditional times.

That makes more sense for me :) I'd like to help the testers to retry the
(randomly) failed cases, to help them to get their testing statistics. That's
better than recording these statistics in fstests itself.

> Another potential related feature
> which we currently have in our daily spinner infrastructure at $WORK
> would be to on a test failure, rerun a test up to M times (typically a
> small number, such as 3), and if it passes on a retry attempt, declare
> the test result as "flaky", and stop running the retries.  If the test
> repeatedly fails after M attempts, then the test result is "fail".
> 
> These results would be reported in the junit XML file, and would allow
> the test runners to annotate their test summaries appropriately.
> 
> I'm thinking about trying to implement something like this in my
> copious spare time; but before I do, does the general idea seem
> acceptable?

After a "./check ..." done, generally fstests shows 3 list:
  Ran: ...
  Not run: ...
  Failures: ...

So you mean if the "--retry-failures N" is specified. we can have one more list
named "Flaky", which is part of "Failures" list, likes:
  Ran: ...
  Not run: ...
  Failures: generic/388 generic/475 xfs/104 xfs/442
  Flaky: generic/388 [2/N] xfs/104 [1/N]

If I understand this correctly, it's acceptable for me. And it might be helpful
for Amir's situation. But let's hear more voice from other developers, if there
is not big objection from other fs maintainers, let's do it :)

BTW, about the new group name to mark cases with random load/operations/env.,
what do you think? Any suggestions or good names for that?

Thanks,
Zorro

> 
> Thanks,
> 
> 					- Ted
>