Re: [LSF/MM TOPIC] FS, MM, and stable trees

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Wed, 9 Mar 2022 10:57:24 -0800

On Tue, Mar 08, 2022 at 02:06:57PM -0500, Sasha Levin wrote:
> What we can't do is invest significant time into doing the testing work
> ourselves for each and every subsystem in the kernel.

I think this experience helps though, it gives you I think a better
appreciation for what concerns we have to merge any fix and the effort
and dilligence required to ensure we don't regress. I think the
kernel-ci steady state goal takes this a bit further.

> The testing rig I had is expensive, not even just time-wise but also
> w.r.t the compute resources it required to operate, I suspect that most
> of the bots that are running around won't dedicate that much resources
> to each filesystem on a voluntary basis.

Precicely because of the above is *why* one of *my* requirements for
building a kernel-ci system was to be able to ensure I can run my tests
regardless of what employer I am at, and easily ramp up. So I can use
local virtualized solutions (KVM or virtualbox), or *any* cloud solution
at will (AWS, GCE, Azure, OpenStack). And so kdevops enables all this
using the same commands I posted before, using simple make target
commands.

Perhaps the one area that might interest folks is the test setup,
using loopback drives and truncated files, if you find holes in
this please let me know:

https://github.com/mcgrof/kdevops/blob/master/docs/testing-with-loopback.md

In my experience this setup just finds *more* issues, rather than less,
and in my experience as well none of these issues found were bogus, they
always lead to real bugs:

https://github.com/mcgrof/kdevops/blob/master/docs/seeing-more-issues.md

A test rig for a high kernel-ci steady state goal does require
resources, time and effort. Fortunately I am now confident in the
architecture behind the tests / automation though. So all that is
really needed now is just a dedicated system to run these, agree what
configs we'd test (I have some well defined and documented for XFS on
kdevops through Kconfig, based on conversations we last had about stable
testing), work with a public baseline to reflect this setup (I have
public baselines already published for tons of kernels and for different
filesystems), and then test candidate fixes. This later effort is still
time consuming too. But with a proper ongoing rig running a kernel-ci,
this becomes much easier and it is a much smoother sailing process.

> I can comment on what I'm seeing with Google's COS distro: it's a
> chicken-and-egg problem. It's hard to offer commercial support with the
> current state of xfs, but on the other hand it's hard to improve the
> state of xfs without a commercial party that would invest more
> significant resources into it.

This is the non-Enterprise argument to it.

And yes. I agree, but it doesn't mean we can't resolve it. I think we
just need to agree to a a dedicated test rig, test setup, and a public
baseline might be a good next step.

> Luckily there is an individual in Google who has picked up this work and
> hopefully we will see something coming out of it very soon, but honestly
> - we just got lucky.

Groovy.

  Luis