Re: [LSF/MM/BPF TOPIC] Filesystem testing

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 19 Mar 2024 09:06:18 +1100

On Mon, Mar 18, 2024 at 02:48:51PM -0400, Gabriel Krisman Bertazi wrote:
> Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> writes:
> 
> > Leah Rumancik <leah.rumancik@xxxxxxxxx> writes:
> >
> >> Last year we covered the new process for backporting to XFS. There are
> >> still remaining pain points: establishing a baseline for new branches
> >> is time consuming, testing resources aren't easy to come by for
> >> everyone, and selecting appropriate patches is also time consuming. To
> >> avoid the need to establish a baseline, I'm planning on converting to
> >> a model in which I only run failed tests on the baseline. I test with
> >> gce-xfstests and am hoping to automate a relaunch of failed tests.
> >> Perhaps putting the logic to process the results and form new ./check
> >> commands could live in fstests-dev in case it is useful for other
> >> testing infrastructures.
> >
> > Nice idea. Another painpoint to add - 
> > 4k blocksize gets tested a lot but as soon as we switch to large block
> > size testing, either with LBS, or on a system with larger pagesize...
> > ...we quickly starts seeing problems. Most of them could be testcase
> > failure, so if this could help establish a baseline, that might be helpful.
> >
> >
> > Also if could collborate on exclude/known failures w.r.t different
> > test configs that might come handy for people who are looking to help in
> > this effort. In fact, why not have different filesystems cfg files and their
> > corresponding exclude files as part of fstests repo itself?  
> > I know xfstests-bld maintains it here [1][2][3]. And it is rather
> > very convinient to point this out to anyone who asks me of what test
> > configs to test with or what tests are considered to be testcase
> > failures bugs with a given fs config.
> >
> > So it will very helpful if we could have a mechanism such that all of
> > this fs configs (and it's correspinding excludes) could be maintained in
> > fstests itself, and anyone who is looking to test any fs config should
> > be quickly be able to test it with ./check <fs_cfg_params>. Has this
> > already been discussed before? Does this sound helpful for people who
> > are looking to contribute in this effort of fs testing?

Filesystem configs have already been implemented, yes? i.e. config
file sections.

We can do delta definitions like this in the config file:

RECREATE_TEST_DEV=true
TEST_MNT=/mnt/test
TEST_DEV=/dev/vda
SCRATCH_MNT=/mnt/scratch
SCRATCH_DEV=/dev/vdb
MKFS_OPTIONS=
MOUNT_OPTIONS=

[xfs_4k]
MKFS_OPTIONS="-m rmapbt=1"

[xfs_4k_quota]
MKFS_OPTIONS="-m rmapbt=1"
MOUNT_OPTIONS="-o uquota,gquota,pquota"

[xfs_1k]
MKFS_OPTIONS="-m rmapbt=1 -b size=1k"
MOUNT_OPTIONS=

[xfs_n64k]
MKFS_OPTIONS="-m rmapbt=1 -n size=64k"

....

And then simply run 'check -s xfs_n64k' or "-s xfs_4k_quota" or
"-s xfs_1k", etc to run the tests against a pre-defined filesystem
configuration.

The actual per-system customised part of the config file is the
initial device and mount definitions, all the fs config definitions
are fixed and never really change. So we could ship a config file
like the above as a template alongside config/example.config (e.g.
example.xfs.config) and then the test environment setup can simply
copy the file and use sed to rewrite the devices/mount points to
match what it is going to use...

IOWs, I think the fs config thing is already a solved problem, and
we already have precedent for shipping example config files...

As for excludes - unlike fs configs, these are not static across all
test environments. They are entirely dependent on what
kernel/userspace combination is being tested and the constraints the
test running is executing under (e.g.  runtime constraints). IOWs,
every external test runner has a different set of tests that it will
need to expunge...

As it is, it would be trivial to add a config file section variable
to define an expunge file for a given config section. That way
the test running could keep it's own expunge files and add them
to the relevant section when setting up the test VM environment,
same as it would do for the devices and mounts.

That way the expunge file isn't needed on the CLI, and so the test
runner could just do 'check -s xfs_4k -s xfs_1k -s xfs_4k_quota" and
get all those configs tested and have all the local expunges for the
different configs just work....

> > [1] [ext4]:
> > https://github.com/tytso/xfstests-bld/tree/master/test-appliance/files/root/fs/ext4/cfg
> 
> Looking at the expunge comments, I think many of those entries should
> just be turned into inline checks in the test preamble and skipped with
> _notrun.

This is the right thing to do - reduce the reliance on expunge
files, and hence get rid of the need for them in most cases
altogether. The best code is -no code-.

> The way I see it, expunged tests should be kept to a minimum,
> and the goal should be to eventually remove them from the list, IMO.
> They are tests that are known to be broken or flaky now, and can be safely
> ignored when doing unrelated work, but that will be fixed in the
> future. Tests that will always fail because the feature doesn't exist in
> the filesystem, or because it asks for an impossible situation in a
> specific configuration should be checked inline and skipped, IMO.

> +1 for the idea of having this in fstests.  Even if we
> lack the infrastructure to do anything useful with it in ./check,
> having them in fstests will improve collaboration throughout
> different fstests wrappers (kernelci, xfstests-bld, etc.)

Except that this places the maintenance burden on fstests, in
an environment where we can do -nothing- to validate the correctness
of these lists, nor have any idea of when tests should or
shouldn't be placed in these lists.

i.e. If your test runner needs to expunge tests for some reason,
either keep the expunge lists with the test runner, or add detection
to the test that automatically _notrun()s the test in enviroments
where it shouldn't be run....

I'd much prefer the improvement of _notrun detection over spreading
the expunge file mess further into fstests. THis helps remove the
technical debt (lack of proper checking in the test) rather than
kicking it down the road for someone else to have to deal with in
future.

Centralisation of third party expunge file management is not the
answer.  We should be trying to reduce our reliance on expunges and
the maintenance overhead they require, not driving that expunge file
maintaintenance overhead into fstests itself...

-Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx