Re: [LSF/MM TOPIC] improving storage testing

Omar Sandoval <osandov@xxxxxxxxxxx> · Thu, 14 Feb 2019 13:56:34 -0800

On Wed, Feb 13, 2019 at 01:07:54PM -0500, Theodore Y. Ts'o wrote:
> This should probably be folded into other testing proposals but I'd
> like to discuss ways that we can improve storage and file systems
> testing.  Specifically,
> 
> 1) Adding some kind of "smoke test" group.  The "quick" group in
> xfstests is no longer terribly quick.  Using gce-xfstests, the time to
> run the quick group on f2fs, ext4, btrfs, and xfs is 17 minutes, 18
> minutes, 25 minutes, and 31 minutes, respectively.  It probably won't
> be too contentious to come up with some kind of criteria --- stress
> tests plus maybe a few tests added to maximize code coverage, with the
> goal of the smoke test to run in 5-10 minutes for all major file
> systems.
> 
> Perhaps more controversial might be some way of ordering the tests so
> that the ones which are most likely to fail if a bug has been
> introduced are run first, so that we can have a "fail fast" sort of
> system.
> 
> 2) Documenting what are known failures should be for various tests on
> different file systems and kernel versions.  I think we all have our
> own way of excluding tests which are known to fail.  One extreme case
> is where the test case was added to xfstests (generic/484), but the
> patch to fix it got hung up because it was somewhat controversial, so
> it was failing on all file systems.
> 
> Other cases might be when fixing a particular test failure is too
> complex to backport to stable (maybe because it would drag in all
> sorts of other changes in other subsystems), so that test is Just
> Going To Fail for a particular stable kernel series.
> 
> It probably doesn't make sense to do this in xfstests, which is why we
> all have our own individual test runners that are layered on top of
> xfstests.  But if we want to automate running xfstests for stable
> kernel series, some way of annotating fixes for different kernel
> versions would be useful, perhaps some kind of centralized clearing
> house of this information would be useful.
> 
> 3) Making blktests more stable/useful.  For someone who is not a block
> layer specialist, it can be hard to determine whether the problem is a
> kernel bug,

>From my experience with running xfstests at Facebook, the same thing
goes for xfstests :) The filesystem developers on the team are the only
ones that can make sense of any test failures.

> a kernel misconfiguration

In theory, every test should verify that the kernel is configured
correctly and skip the test if not, just like xfstests.

> some userspace component (such as nvme-cli) being out of date or just
> a test bug.  (For example, all srp/* tests are currently failing in
> blktests upstream; I had to pull some not-yet-merged commits from
> Bart's tree in order to fix bugs that caused all of srp to fail.)
> 
> Some of the things that we could do include documenting what kernel
> CONFIG options are needed to successfully run blktests, perhaps using
> a defconfig list.

Have you encountered issues where missing config options have caused
test failures? Or you want the config options for maximum coverage? If
you have examples of the former, I'll fix them up. For the latter, I
have a list somewhere that I can add to the blktests repository.

> Also, there are expectations about minimum versions of bash that can
> be supported; but there aren't necessarily for other components such
> as nvme-cli, and I suspect that it is due to the use of a overly new
> version of nvme-cli from its git tree.  Is that supposed to work, or
> should I constrain myself to whatever version is being shipped in
> Fedora or some other reference distribution?  More generally, what is
> the overall expectations that should be expected?

My (undocumented) rule of thumb has been that blktests shouldn't assume
anything newer than whatever ships on Debian oldstable. I can document
that requirement.

For specific tests that require a newer feature, the test _should_ check
that the feature is available. Please report any tests where that isn't
the case, although I'll likely defer to the contributors for
nvme/srp/zbd issues.

> xfstests has some
> extremely expansive set of sed scripts to normalize shell script
> output to make xfstests extremely portable; will patches along similar
> lines something that we should be doing for blktests?

Yup, we've added a couple of these. We should add more as needed.

blktests is new, so we have some rough edges, but I'd like to think that
we're trying to do the right things. Please report the cases where we're
not and we'll get them fixed up.