Re: [PATCH] xfs: new test to ensure xfs can capture IO errors correctly

Zorro Lang <zlang@xxxxxxxxxx> · Thu, 27 Oct 2022 10:24:59 +0800

On Wed, Oct 26, 2022 at 11:30:29AM -0700, Darrick J. Wong wrote:
> On Thu, Oct 27, 2022 at 12:57:47AM +0800, Zorro Lang wrote:
> > There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure
> > we capture IO errors correctly"), so trys to cover this bug and make
> > sure xfs can capture IO errors correctly, won't panic and hang again.
> > 
> > Signed-off-by: Zorro Lang <zlang@xxxxxxxxxx>
> > ---
> > 
> > Hi,
> > 
> > When I tried to tidy up our internal test cases recently, I found a very
> > old case which trys to cover e001873853d8 ("xfs: ensure we capture IO errors
> > correctly") which fix by Dave. At that time, we didn't support xfs injection,
> > so we tested it by a systemtap script [1] to inject an ioerror.
> > 
> > Now this bug has been fixed long long time ago (9+ years), and that stap script
> > is already out of date, can't work with new kernel. But good news is we have xfs
> > injection now, so I try to resume this test case in fstests.
> > 
> > I didn't verify if this case can reproduce that bug on old rhel (which doesn't
> > support error injection). The original case tried to inject errno 11, I'm
> > not sure if it's worth trying more other errors. I searched "buf_ioerror" in
> > fstests, found nothing. So maybe this bug is old enough, but it's worth covering
> > this kind of test. So feel free to tell me if you have any suggestions :)
> > 
> > Thanks,
> > Zorro
> > 
> > [1]
> > probe module("xfs").function("xfs_buf_bio_end_io")
> > {
> >         if ($error == 0) {
> >                 if ($bio->bi_rw & (1 << 4)) {
> >                         $error = -11;
> >                         printf("%s: comm %s, pid %d, setting error 11\n",
> >                                 probefunc(), execname(), pid());
> >                         print_stack(backtrace());
> >                 }
> >         }
> > }
> > 
> >  tests/xfs/554     | 53 +++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/554.out |  4 ++++
> >  2 files changed, 57 insertions(+)
> >  create mode 100755 tests/xfs/554
> >  create mode 100644 tests/xfs/554.out
> > 
> > diff --git a/tests/xfs/554 b/tests/xfs/554
> > new file mode 100755
> > index 00000000..6935bfc0
> > --- /dev/null
> > +++ b/tests/xfs/554
> > @@ -0,0 +1,53 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2022 YOUR NAME HERE.  All Rights Reserved.
> 
> Mr. YOUR HERE,
> 
> Please write your real name in the copyright statement.
> 
> > +#
> > +# FS QA Test 554
> > +#
> > +# There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure we
> > +# capture IO errors correctly"), so trys to cover this bug and make sure
> > +# xfs can capture IO errors correctly, won't panic and hang again.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto eio
> > +
> > +_cleanup()
> > +{
> > +	$KILLALL_PROG -q fsstress 2> /dev/null
> > +	# ensures all fsstress processes died
> > +	wait
> > +	# log replay, due to the buf_ioerror injection might leave dirty log
> > +	_scratch_cycle_mount
> > +	cd /
> > +	rm -r -f $tmp.*
> > +}
> > +
> > +# Import common functions.
> > +. ./common/inject
> > +
> > +# real QA test starts here
> > +_supported_fs xfs
> > +_require_command "$KILLALL_PROG" "killall"
> > +_require_scratch
> > +_require_xfs_debug
> > +_require_xfs_io_error_injection "buf_ioerror"
> > +
> > +_scratch_mkfs >> $seqres.full
> > +_scratch_mount
> > +
> > +echo "Inject buf ioerror tag"
> > +_scratch_inject_error buf_ioerror 11
> > +
> > +echo "Random I/Os testing ..."
> > +$FSSTRESS_PROG $FSSTRESS_AVOID -d $SCRATCH_MNT -n 50000 -p 100 >> $seqres.full &
> > +for ((i=0; i<5; i++));do
> > +	# Clear caches, then trys to use 'find' to trigger readahead
> 
> BUF_IOERROR only seems to apply to async writes:
> 
> static void
> xfs_buf_bio_end_io(
> 	struct bio		*bio)
> {
> 	struct xfs_buf		*bp = (struct xfs_buf *)bio->bi_private;
> 
> 	if (!bio->bi_status &&
> 	    (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
> 	    XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
> 		bio->bi_status = BLK_STS_IOERR;
> 
> So I don't see how this would reproduce the problem of b_error not being
> cleared after a failed readahead and re-read?

Oh, "bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC)" ... so I don't
have chance to cover this bug? I have to abandon this patch, or we'd like to
change it to be a general async ioerror injection test.

Thanks,
Zorro

> 
> --D
> 
> > +	echo 3 > /proc/sys/vm/drop_caches
> > +	find $SCRATCH_MNT >/dev/null 2>&1
> > +	sleep 3
> > +done
> > +
> > +echo "No hang or panic"
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/xfs/554.out b/tests/xfs/554.out
> > new file mode 100644
> > index 00000000..26910daa
> > --- /dev/null
> > +++ b/tests/xfs/554.out
> > @@ -0,0 +1,4 @@
> > +QA output created by 554
> > +Inject buf ioerror tag
> > +Random I/Os testing ...
> > +No hang or panic
> > -- 
> > 2.31.1
> > 
>