[LSF/MM TOPIC] improving storage testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This should probably be folded into other testing proposals but I'd
like to discuss ways that we can improve storage and file systems
testing.  Specifically,

1) Adding some kind of "smoke test" group.  The "quick" group in
xfstests is no longer terribly quick.  Using gce-xfstests, the time to
run the quick group on f2fs, ext4, btrfs, and xfs is 17 minutes, 18
minutes, 25 minutes, and 31 minutes, respectively.  It probably won't
be too contentious to come up with some kind of criteria --- stress
tests plus maybe a few tests added to maximize code coverage, with the
goal of the smoke test to run in 5-10 minutes for all major file
systems.

Perhaps more controversial might be some way of ordering the tests so
that the ones which are most likely to fail if a bug has been
introduced are run first, so that we can have a "fail fast" sort of
system.

2) Documenting what are known failures should be for various tests on
different file systems and kernel versions.  I think we all have our
own way of excluding tests which are known to fail.  One extreme case
is where the test case was added to xfstests (generic/484), but the
patch to fix it got hung up because it was somewhat controversial, so
it was failing on all file systems.

Other cases might be when fixing a particular test failure is too
complex to backport to stable (maybe because it would drag in all
sorts of other changes in other subsystems), so that test is Just
Going To Fail for a particular stable kernel series.

It probably doesn't make sense to do this in xfstests, which is why we
all have our own individual test runners that are layered on top of
xfstests.  But if we want to automate running xfstests for stable
kernel series, some way of annotating fixes for different kernel
versions would be useful, perhaps some kind of centralized clearing
house of this information would be useful.

3) Making blktests more stable/useful.  For someone who is not a block
layer specialist, it can be hard to determine whether the problem is a
kernel bug, a kernel misconfiguration, some userspace component (such
as nvme-cli) being out of date, or just a test bug.  (For example, all
srp/* tests are currently failing in blktests upstream; I had to pull
some not-yet-merged commits from Bart's tree in order to fix bugs that
caused all of srp to fail.)

Some of the things that we could do include documenting what kernel
CONFIG options are needed to successfully run blktests, perhaps using
a defconfig list.

Also, there are expectations about minimum versions of bash that can
be supported; but there aren't necessarily for other components such
as nvme-cli, and I suspect that it is due to the use of a overly new
version of nvme-cli from its git tree.  Is that supposed to work, or
should I constrain myself to whatever version is being shipped in
Fedora or some other reference distribution?  More generally, what is
the overall expectations that should be expected?  xfstests has some
extremely expansive set of sed scripts to normalize shell script
output to make xfstests extremely portable; will patches along similar
lines something that we should be doing for blktests?

Cheers,

					- Ted



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux