Re: [PATCH 2/2] fstests: add configuration option for executing post mkfs commands

Dave Chinner <david@xxxxxxxxxxxxx> · Sat, 7 Oct 2023 09:12:33 +1100

On Fri, Oct 06, 2023 at 05:16:31PM +1030, Qu Wenruo wrote:
> However for the whole btrfs/fstests combination, we have several
> features which can not be easily integrated into fstests.
> 
> The biggest example is multi-device management.
> 
> For now, only some btrfs specific test cases are utilizing
> SCRATCH_DEV_POOL to cover multi-device functionality (including all the
> RAID and seed support).
> This means way less coverage for seed and btrfs RAID, all generic group
> would not utilize btrfs RAID/seed functionality at all.

IOWs, you are saying that the btrfs device setup code in fstests is
functionally deficient.

> 
> For a better coverage, or for more complex setup (maybe dm-dust for XFS
> log device?), I am not that convinced if the current plain mkfs options
> is good enough.

We already know mkfs alone isn't sufficent - that's why we have
filesystem specific mkfs fucntions for any filesystem that needs to
do something more complex than run mkfs....

i.e. we already have infrastructure that we use to solve this
problem - there are example implementations that you can look at to
follow.

> 
> Thus I'm more interested in exploring the possibility to "out-source"
> those basic functionality (from mkfs to check) to outside scripts, as
> we're not that far away to hit the limits of the existing framework. (At
> least for btrfs)

The whole idea that we set up devices for testing via magic,
undocumented, private external scripts is antithetical to the
purpose of fstests. The device model used in fstests is that you tell it
what configuration you want, and it does all the work to set them up
that way. This allows tests to override or skip incompatible
configurations based on known config variables, etc.

It also allows -everyone- to test complex configurations without
needing to share private, external scripts or knowing any of the
intricate details needed to set up that configuration. External
scripts are like proprietary code - it only works if you have some
magic secret sauce that nobody else knows about.

If it's hard to set something up in fstests, then *fix that
problem*. If you are adding code in environment variables and
hacking in environment varaibles to run that code, then the -code
itself- should be in fstests.

Having the code in fstests means that anyone can add
"BTRFS_SCRATCH_UUID='<uuid>' to their config file to change uuids
for the devices being tested. They don't need to know waht magic
command is needed to do this, when it needs to be set, what changes
elsewhere in fstests they need to watch out for, which tests is
might conflict with, etc.

Hiding this in some custom script means it can't be easily
documented, can't be easily or widely replicated, it can't be
discovered by reading the fstests code, and it isn't obvious to
-anyone- that it is part of the btrfs test matrix that needs to be
exercised.

IOWs, it's just really bad QA architecture to externalise random
parts of the test environment configuration.  If the configuration
needs to be tested, then the infrastructure should support that
directly and it should be easily discoverable and used by people
largely unfamiliar with btrfs volume management (i.e. typical distro
QA environment).

> > I suppose the problem there is that mkfs.btrfs won't itself create a
> > filesystem with the metadata_uuid field that doesn't match the other
> > uuid?
> 
> That's not a big deal, we (at least me) are very open to add this mkfs
> feature.
> 
> But there are other limits, like the fsck part.
> 
> For now, btrfs follows the behavior of other fses, just check the
> correctness of the metadata, and ignore the correctness of data.
>
> But remember btrfs has data checksum by default, thus it can easily
> verify the data too, and we have the extra switch ("--check-data-csum"
> option) to enable that for "btrfs check".

Which is yet another arguement for the code being in fstests and
controlled by an environment variable.

This is *exactly* the case for the LARGE_SCRATCH_DEV stuff that ext4
and XFS support in the mkfs routines. On the XFS side we have
LARGE_SCRATCH_DEV checks in -both- the XFS mkfs and check/repair
functions to handle this configuration correctly.

IOWs, what you want to do is add a config variable for
BTFS_SCRATCH_CHECK_DATA, and trigger off that in all btrfs specific
functions that need to add, modify or check data checksums.

> For now we're not going to enable the "--check-data-csum" option nor we
> have the ability to teach fstests how to change the behavior.

We most certainly do have the ability to do this in fstests, and
quite easily.

Another example is the USE_EXTERNAL variable that tells XFS and ext4
that external log devices (and rt devices for XFS) are to be used.
This has hooks all over mkfs, mount, check, repair, xfs_db, quota
and fs population functions so that they all specify devices
appropriately.

That is, this config variable directly modifies the command lines
used for these operations - it is an even better example of FS
specific device configuration driving by config variables than
LARGE_SCRATCH_DEV.  This model will work just fine for stuff like
the --check-data-csum btrfs specific check option being talked about
here, and the only thing that needs to change is the btrfs specific
check/repair functions...

> Thus I'm taking the chance to explore any way to "out-source" those
> mkfs/fsck functionality, even this means other fses may not even bother
> as the current framework just works good enough for them.

And as I said above, that's the wrong model for fstests - it means
that a typical QA environment is not going to be able to test
complex things because the people running the tests do not know how
to write these complex "out-sourced" scripts to configure the test
environment.

Having all the code in fstests and triggering it via a config
variable is the right way to do this sort of thing. It works for
everyone and it's easy to replicate the test environment and
configurations for reproduction of issues that are found.

If the test envirnoment is dependent on private scripts for
configuration and reproduction of issues, then how do other people
reproduce the problems you might find? Yeah, you have to share all
your scripts for everyone to run, and at that point the code
actually needs to be in fstests itself because it's proven to be a
useful test configuration that everyone should be running....

> But IIRC, even f2fs is gaining multi-device support, I believe this is
> not a btrfs specific thing, but a framework limitation.

The scratch dev pool was an easy extension to support multi-device
btrfs filesystems done in the really early days when there was
almost zero btrfs specific test coverage in fstests. I'm not
surprised that it has warts and may not do everything that btrfs
developers might need these days.

However, we don't need custom hooks to externalise scripts - we
already have a working model for config driven filesystem specific
device configuration. I don't see that there is any major common
infrastructure change needed, most of what I'm hearing is that the
btrfs specific device configuration needs to catch up with how other
filesystems have been testing complex device configurations....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx