Re: [PATCH] fstest: CrashMonkey tests ported to xfstest

Vijaychidambaram Velayudhan Pillai <vijay@xxxxxxxxxxxxx> · Thu, 8 Nov 2018 09:35:56 -0600

On Thu, Nov 8, 2018 at 3:40 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Wed, Nov 07, 2018 at 01:09:22PM +1100, Dave Chinner wrote:
> > To put it in other words, developers need tests focussed on finding
> > bugs quickly, not regression tests that provide the core
> > requirements of integration and release testing. The development
> > testing phase is all about finding fast and effciently.
>
> To emphasis my point of having tests and tools capable of finding
> new bugs, I noticed yesterday that fstress and fsx didn't support
> copy_file_range, and fsx doesn't support clone/dedupe_file_range
> either. Darrick added them overnight.
>
> fsx as run by generic/263 takes *32* operations to find a data
> corruption with copy_file_range on XFS. Even changing it to do
> buffered IO instead of direct IO, it only takes ~600 operations to
> fail with a different data corruption.
>
> That's *at least* two previously unknown bugs exposed in under a
> second of runtime.
>
> That's the sort of tooling we need - we don't need hundreds of tests
> that are scripted reproducers of fixed problems, we need tools that
> exercise boundary conditions and corner cases in ways that are
> likely to expose incorrect behaviour. Tools that do these things
> quickly and in a reproducable manner are worth their weight in
> gold...
>
> IMO, Quality Engineering is not just about writing regression tests
> to keep out known bugs - it's most important function is developing
> and refining new testing tools to find bugs that have escaped
> detection with existing testing methods and tools. If test engineers
> can find new bugs, software engineers can fix them. That's really
> the ultimate goal here - to find bugs and fix them before users are
> exposed to them...

Dave, I think there is some confusion about what CrashMonkey does. I
think you'll find its very close to what you want. Let me explain.

CrashMonkey does exactly the kind of systematic testing that you want.
Given a set of system calls, it generates tests for crash consistency
for different workloads comprising of these system calls. It does this
by testing each system call first, then each pair of system calls, and
so on. Both the workload (which system calls to test) and the check
(what should the file system look like after crash recovery) are
automatically generated, without any human effort in the loop.

CrashMonkey found 10 new bugs in btrfs and F2FS, so its not just a
suite of regression tests.

When we studied previous crash-consistency bugs reported and fixed in
the kernel, we noticed most of them could be reproduced on a clean fs
image of small size (100 MB). We found that the arguments to the
system calls could also be constrained: we just needed to reuse a
small set of file names or file ranges. We used this to automatically
generate xfstests-style tests for each file system. We generated and
tested a total of 3.3M workloads on a research cluster at UT Austin.

We found that even testing a single system call revealed three new
bugs (which have not all been patched yet). To systematically test
single system calls, you need about 300 tests. This is what Jayashree
is trying to add to fstests. Since there are very few
crash-consistency tests currently in fstests, it might be good to add
more coverage with these tests.

The CrashMonkey github repo is available here:
https://github.com/utsaslab/crashmonkey
The link to the paper: https://www.cs.utexas.edu/~jaya/pdf/osdi18-B3.pdf
Talk slides: https://www.cs.utexas.edu/~jaya/slides/osdi18-B3-slides.pdf