Re: [WIP PATCH V2] Buffer resubmittion test

Carlos Maiolino <cmaiolino@xxxxxxxxxx> · Mon, 10 Jul 2017 17:52:23 +0200

On Mon, Jul 10, 2017 at 11:13:44AM -0400, Brian Foster wrote:
> On Mon, Jul 10, 2017 at 05:06:12PM +0200, Carlos Maiolino wrote:
> > > 
> > > Thinking more about this, we could keep the sleep and also add the wait
> > > right before the unfreeze, right?
> > 
> > Well, depends, we still need to sleep there, an lvextend right after the freeze
> > won't give xfsaild enough time to run, so the problem won't be triggered, and
> > adding a wait right before unfreeze, seems pointless to me.
> 
> Right, this wouldn't change the existing sleep at all. Basically it's
> just a preference to have the test block in wait rather than freeze and
> unfreeze because the latter seems a bit more confusing to me. Not a big
> deal either way.
> 
> > > 
> > > > Let me know your thoughts.
> > > > 
> > > > cheers.
> > > > 
> > > > Signed-off-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
> > > > ---
> > > 
> > > Mostly looks good to me. A few minor notes...
> > > 
> > > >  tests/xfs/999     | 119 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  tests/xfs/999.out |   2 +
> > > >  tests/xfs/group   |   1 +
> > > 
> > > The test is still under xfs (rather than generic).
> > > 
> > 
> > yup, didn't move it to generic yet. I have a question about it. How do we
> > specify which filesystem to run the test on without needing to mkfs the device
> > for the filesystem in question?
> > 
> 
> $FSTYP should contain the target fs, if I understand the question..?

Pretty much,

I'll update the patch and send it properly formatted with an updated test
number.

thanks for the review

> 
> Brian
> 
> > 
> > > >  3 files changed, 122 insertions(+)
> > > >  create mode 100755 tests/xfs/999
> > > >  create mode 100644 tests/xfs/999.out
> > > > 
> > > > diff --git a/tests/xfs/999 b/tests/xfs/999
> > > > new file mode 100755
> > > > index 0000000..b46f1cc
> > > > --- /dev/null
> > > > +++ b/tests/xfs/999
> > > > @@ -0,0 +1,119 @@
> > > > +#! /bin/bash
> > > > +# FS QA Test 999
> > > > +#
> > > > +# Test buffer resubmission after a failed writeback with to a full overcommited
> > > > +# dm-thin device.
> > > > +#
> > > > +# When a dm-thin device reaches its full capacity, but the virtual device still
> > > > +# shows available space, XFS loops indefinitely in xfsaild due items still in
> > > > +# AIL. The buffers containing such items couldn't be resubmitted because the
> > > > +# items were flush locked. Test the kernel fix and ensure the buffers are
> > > > +# properly resubmitted.
> > > > +#
> > > > +# This test will hang the filesystem when ran on an unpatched kernel
> > > > +#
> > > > +#-----------------------------------------------------------------------
> > > > +# Copyright (c) 2017 Red Hat, Inc. All Rights Reserved.
> > > > +#
> > > > +# This program is free software; you can redistribute it and/or
> > > > +# modify it under the terms of the GNU General Public License as
> > > > +# published by the Free Software Foundation.
> > > > +#
> > > > +# This program is distributed in the hope that it would be useful,
> > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > > +# GNU General Public License for more details.
> > > > +#
> > > > +# You should have received a copy of the GNU General Public License
> > > > +# along with this program; if not, write the Free Software Foundation,
> > > > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > > > +#-----------------------------------------------------------------------
> > > > +#
> > > > +
> > > > +seq=`basename $0`
> > > > +seqres=$RESULT_DIR/$seq
> > > > +echo "QA output created by $seq"
> > > > +
> > > > +here=`pwd`
> > > > +tmp=/tmp/$$
> > > > +status=1	# failure is the default!
> > > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > > +
> > > > +_cleanup()
> > > > +{
> > > > +	cd /
> > > > +	rm -f $tmp.*
> > > > +	$UMOUNT_PROG $SCRATCH_MNT >>$seqres.full 2>&1
> > > > +	$LVM_PROG vgremove -ff $vgname >>$seqres.full 2>&1
> > > > +	$LVM_PROG pvremove -ff $SCRATCH_DEV >>$seqres.full 2>&1
> > > > +}
> > > > +
> > > > +# get standard environment, filters and checks
> > > > +. ./common/rc
> > > > +. ./common/filter
> > > 
> > > Is filter used anywhere?
> > 
> > Nope, the first thing I was doing was filtering dmesg for the IO errors before
> > extending the dm-thin device, and I forgot to remove the filter from here, I'll
> > remove it.
> > 
> > > 
> > > > +
> > > > +# real QA test starts here
> > > > +
> > > > +# Modify as appropriate.
> > > > +_supported_fs xfs
> > > > +_supported_os Linux
> > > > +_require_scratch_nocheck
> > > > +_require_dm_target thin-pool
> > > > +_require_command $LVM_PROG lvm
> > > > +
> > > > +# remove previous $seqres.full before test
> > > > +rm -f $seqres.full
> > > > +
> > > > +vgname=vg_$seq
> > > > +lvname=lv_$seq
> > > > +poolname=pool_$seq
> > > > +snapname=snap_$seq
> > > > +origpsize=100
> > > > +virtsize=200
> > > > +newpsize=200
> > > > +
> > > > +# Ensure we have enough disk space
> > > > +_scratch_mkfs_sized $((250 * 1024 * 1024)) >>$seqres.full 2>&1
> > > > +
> > > > +# Create a 100MB dm-thin POOL
> > > > +$LVM_PROG pvcreate -f $SCRATCH_DEV >>$seqres.full 2>&1
> > > > +$LVM_PROG vgcreate -f $vgname $SCRATCH_DEV >>$seqres.full 2>&1
> > > > +
> > > > +$LVM_PROG lvcreate  --thinpool $poolname  --errorwhenfull y \
> > > > +		    --zero n -L $origpsize \
> > > > +		    --poolmetadatasize 4M $vgname >>$seqres.full 2>&1
> > > > +
> > > > +# Create a overprovisioned 200MB dm-thin virt. device
> > > > +$LVM_PROG lvcreate  --virtualsize $virtsize \
> > > > +		    -T $vgname/$poolname \
> > > > +		    -n $lvname >>$seqres.full 2>&1
> > > > +
> > > > +_mkfs_dev /dev/mapper/$vgname-$lvname >>$seqres.full 2>&1
> > > > +
> > > > +
> > > > +$LVM_PROG lvcreate  -k n -s $vgname/$lvname \
> > > > +		    -n $snapname >>$seqres.full 2>&1
> > > 
> > > What's the reason for using a snapshot? Is the original thin vol not
> > > sufficient?
> > 
> > No, it's not, honestly I don't remember now, exactly why not, I've been using
> > the snapshot in my internal test case since I started to work on this problem,
> > but without the snapshot I can't trigger the bug, at least not at a 100% rate.
> > 
> > > 
> > > > +
> > > > +_mount /dev/mapper/$vgname-$snapname $SCRATCH_MNT
> > > > +
> > > > +# Consume all space available in the volume and freeze to ensure everything
> > > > +# required to make the fs consistent is flushed to disk.
> > > > +xfs_io -f -d -c 'pwrite -b 1m 0 120m' $SCRATCH_MNT/f1 >>$seqres.full 2>&1
> > > 
> > > $XFS_IO_PROG
> > 
> > True, will change it
> > 
> > > 
> > > > +
> > > > +# This freeze will never complete until the dm-thin POOL device is extended.
> > > > +# This is expected, it is only used so xfsaild is triggered to flush AIL items.
> > > > +fsfreeze -f $SCRATCH_MNT &
> > > > +
> > > > +# Wait enough so xfsaild can run
> > > > +sleep 10
> > > > +
> > > > +# Make some extra space available so the freeze above can proceed
> > > > +lvextend -L $newpsize $vgname/$poolname >>$seqres.full 2>&1
> > > 
> > > $LVM_PROG lvmextend ?
> > > 
> > Indeed, missed it, will fix.
> > 
> > > Brian
> > > 
> > > > +
> > > > +# Try to thaw the filesystem, and complete test if if succeed.
> > > > +# NOTE: This will hang on affected XFS filesystems.
> > > > +fsfreeze -u $SCRATCH_MNT
> > > > +echo "Test OK"
> > > > +
> > > > +status=0
> > > > +exit
> > > > diff --git a/tests/xfs/999.out b/tests/xfs/999.out
> > > > new file mode 100644
> > > > index 0000000..8c3c938
> > > > --- /dev/null
> > > > +++ b/tests/xfs/999.out
> > > > @@ -0,0 +1,2 @@
> > > > +QA output created by 999
> > > > +Test OK
> > > > diff --git a/tests/xfs/group b/tests/xfs/group
> > > > index 792161a..2bde916 100644
> > > > --- a/tests/xfs/group
> > > > +++ b/tests/xfs/group
> > > > @@ -416,3 +416,4 @@
> > > >  416 dangerous_fuzzers dangerous_scrub dangerous_repair
> > > >  417 dangerous_fuzzers dangerous_scrub dangerous_online_repair
> > > >  418 dangerous_fuzzers dangerous_scrub dangerous_repair
> > > > +999 dangerous
> > > > -- 
> > > > 2.9.4
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > -- 
> > Carlos
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html