Re: XFS LTS backport cabal

"Theodore Ts'o" <tytso@xxxxxxx> · Thu, 26 May 2022 11:01:43 -0400

On Wed, May 25, 2022 at 02:23:10PM -0700, Darrick J. Wong wrote:
> 
> 2. Some other tag for patches that could be a fix, but need a few months
> to soak.  This is targetted at (b), since I'm terrible at remembering
> that there are patches that are reaching ripeness.

What I'd suggest here is a simple "Stable-Soak: <date>|<release>" tag.
It wouldn't need to be official, and so we don't need to get the
blessing of the Stable Tree maintainers; it would just be something
that would be honored by the "XFS LTS backport cabal".

> a> I've been following the recent fstests threads, and it seems to me
> that there are really two classes of users -- sustaining people who want
> fstests to run reliably so they can tell if their backports have broken
> anything; and developers, who want the randomness to try to poke into
> dusty corners of the filesystem.  Can we make it easier to associate
> random bits of data (reliability rates, etc.) with a given fstests
> configuration?  And create a test group^Wtag for the tests that rely on
> RNGs to shake things up?

In my experience, tests that have flaky results fall into two
categories; ones that are trying to deal traditional fuzzing, and
those that are running stress tests either by themselves, or as
antagonists against some other operation --- e.g., running fstress
while trying to do an online resize, or why trying to shut down the
file system, etc.

Some of these stress tests do use a PRNG, but even if we fixed the
seed to some value (such as 0), many of the test results would stlil
be potentially flaky.  These test results also tend to be very timing
dependant; so these are the tests that whose failure rate varies
depending on whether the test devices are on a loop device, eMMC flash
device, HDD, SSD, or various cloud virtual block devices, such as
AWS's EBS or or GCE's PD devices.

These tests are often very useful, since if we are missing a lock when
accessing some data structure, these tests which use stress test
programs are the most likely way noticing such problems.  So I don't
think we would want to exclude them; and if we're only excluding those
tests which are doing fuzz testing, I'm not sure it's really move the
needle.

> b> Testing relies very heavily on being able to spin up a lot of testing
> resources.  Can/should we make it easier for people with a kernel.org
> account to get free(ish) cloud accounts with the LF members who are also
> cloud vendors?

If anyone wants to use gce-xfstests, I'm happy to work on sponsoring
some GCE credits for that purpose.  One of the nice things about
gce-xfstests is that Test VM's only are left running when actually
running a test.  Once a test is finished, the VM shuts itself down.
And if we want to run a number of file system configs, we can spawn a
dozen VM's, one for each fsconfig, and when they are done, each VM
shuts itself down except for a small test test manager which collates
the results into a single report.  This makes gce-xfstests much more
cost efficient that those schemes which keeps a VM up and running at
all times, whether it is running tests or not.

Cheers,

						- Ted