Re: [PATCH] xfs/larp: Make test failures debuggable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 06, 2022 at 09:14:42AM -0700, Darrick J. Wong wrote:
> On Fri, May 06, 2022 at 05:51:41PM +1000, Dave Chinner wrote:
> > Hi Catherine,
> > 
> > These are all the mods I needed to make to be able to understand the
> > test failures I was getting as I debugged the new LARP recovery
> > algorithm I've written.  You'll need to massage the test number in
> > this patch to apply it on top of your patch.

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

> > I haven't added any new test cases yet, nor have I done anything to
> > manage the larp sysfs knob, but we'll need to do those in the near
> > future.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > Zorro, can you consider merging this test in the near future?  We're
> > right at the point of merging the upstream kernel code and so really
> > need to start growing the test coverage of this feature, and this
> > test should simply not-run on kernels that don't have the feature
> > enabled....
> > 
> > Cheers,
> > 
> > Dave.
> > ---
> > 
> >  tests/xfs/600     |  20 +++++-----
> >  tests/xfs/600.out | 109 ++++++++++++++++++++++++++++++++++++------------------
> >  2 files changed, 85 insertions(+), 44 deletions(-)
> > 
> > diff --git a/tests/xfs/600 b/tests/xfs/600
> > index 252cdf27..84704646 100755
> > --- a/tests/xfs/600
> > +++ b/tests/xfs/600
> > @@ -16,7 +16,7 @@ _begin_fstest auto quick attr
> >  
> >  _cleanup()
> >  {
> > -	rm -rf $tmp.* $testdir
> > +	rm -rf $tmp.*
> >  	test -w /sys/fs/xfs/debug/larp && \
> >  		echo 0 > /sys/fs/xfs/debug/larp
> 
> Blergh, this ^^^^^^^^^ is going to need fixing too.
> 
> Please save the old value, then write it back in the _cleanup function.

Yes, I know about this. I haven't done anything about it yet because
I haven't decided yet whether it should stay or be removed. Getting
the test and the code it tests working has been my focus so far.

I have a wrapper script that sets the knob for the entire of the
fstests run (i.e. so all attr and recovery tests exercise the LARP
functionality) and so I'm not sure that this test should twiddle the
knob at all. i.e. right now it would be more useful to me for the
test to check if the knob is not set to just _not_run the test,
rather than turn it on and off and all that...

Indeed, we probably shouldn't be forcing everyone to test a brand
new experimental feature that might be unstable and cause the test
run to halt prematurely (e.g. because it crashes).

> <slightly ot rant>
> 
> These sysfs knobs are a pain because they all reset to defaults if
> xfs.ko gets cycled.  I know, I know, at least Dave and Ted don't do
> modular kernels and so never see this happen, but I do.

sysfs knobs get reset by reboots to load a new kernel, so the impact
is identical - I have to reset the knob every time I load a new test
kernel, just like you have to reset it every time you load a new
module.

> I bet Dave also
> hasn't ever run xfs/434 or xfs/436, which might be why I'm the only one
> seeing dquot leaks with 5.19-next.

Yup, there are a bunch of tests that I never run because:

- scsi_debug requires modular kernel
- I always build with fatal XFS ASSERTs, so the tests that want
  non-fatal asserts don't run
- xfs/43[4-6] require a modular kernel
- xfs/490 requires CONFIG_XFS_DEBUG be turned off (I *never* do
  that)
- I generally don't have source code on my test machines, so
  api-violations.sh won't run (xfs/437)

And there's probably a few more in that list as well.

IMO, any test that depends on a specific debug config or test
environment setup that aren't (or can't be) consistently used by the
entire developer base should be discouraged precisely because it
means different developers end up with different code coverage and
might miss problems they really should know about.

> (I might be able to lift the xfs-as-module requirement if "echo 1 >
> /sys/kernel/slab/*/validate" does what I think it might, since all we
> want to do is look for slab leaks, and those tests rmmod/modprobe as a
> brute force way of making the slab debug code check for leaks.)

Doesn't running fstests with KASAN enabled find those slab leaks?

> In case anyone's wondering, a solution to the knobs getting unset after
> an rmmod/modprobe cycle is to add a file
> /etc/udev/rules.d/99-fstester.rules containing:
> 
> ACTION=="add|change", SUBSYSTEM=="module", DEVPATH=="/module/${module}", RUN+="/bin/sh -c \"echo ${value} > ${knob}\""
> 
> which should be enough to keep LARP turned on.

I don't want it turned on unconditionally - I have to test both the
existing code path and the new LARP code path, so I'm constantly
toggling the value between test runs.

> Anyway, since this is a proposed test, I say that with this applied and
> the debug knob bits fixed, the whole thing is
> Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>

Thanks!

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux