Re: [PATCH] fstests: add test case to make sure btrfs can handle one corrupted device

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Wed, Jul 27, 2022 at 09:39:46AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/7/27 07:09, Boris Burkov wrote:
> > On Tue, Jul 26, 2022 at 02:29:48PM +0800, Qu Wenruo wrote:
> > > The new test case will verify that btrfs can handle one corrupted device
> > > without affecting the consistency of the filesystem.
> > > 
> > > Unlike a missing device, one corrupted device can return garbage to the fs,
> > > thus btrfs has to utilize its data/metadata checksum to verify which
> > > data is correct.
> > 
> > > 
> > > The test case will:
> > > 
> > > - Create a small fs
> > >    Mostly to speedup the test
> > > 
> > > - Fill the fs with a regular file
> > > 
> > > - Use fsstress to create some contents
> > > 
> > > - Save the fssum for later verification
> > > 
> > > - Corrupt one device with garbage but keep the primary superblock
> > >    untouched
> > > 
> > > - Run fssum verification
> > > 
> > > - Run scrub to fix the fs
> > > 
> > > - Run scrub again to make sure the fs is fine
> > > 
> > > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> > 
> > Works for me, and looks like a nice test to complement btrfs/027.
> > Reviewed-by: Boris Burkov <boris@xxxxxx>
> > 
> > > ---
> > >   tests/btrfs/261     | 94 +++++++++++++++++++++++++++++++++++++++++++++
> > >   tests/btrfs/261.out |  2 +
> > >   2 files changed, 96 insertions(+)
> > >   create mode 100755 tests/btrfs/261
> > >   create mode 100644 tests/btrfs/261.out
> > > 
> > > diff --git a/tests/btrfs/261 b/tests/btrfs/261
> > > new file mode 100755
> > > index 00000000..15218e28
> > > --- /dev/null
> > > +++ b/tests/btrfs/261
> > > @@ -0,0 +1,94 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> > > +#
> > > +# FS QA Test 261
> > > +#
> > > +# Make sure btrfs raid profiles can handling one corrupted device
> > > +# without affecting the consistency of the fs.
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest raid
> > > +
> > > +. ./common/filter
> > > +. ./common/populate
> > > +
> > > +_supported_fs btrfs
> > > +_require_scratch_dev_pool 4
> > > +_require_fssum
> > > +
> > > +prepare_fs()
> > > +{
> > > +	local profile=$1
> > > +
> > > +	# We don't want too large fs which can take too long to populate
> > > +	# And the extra redirection of stderr is to avoid the RAID56 warning
> > > +	# message to polluate the golden output
> > > +	_scratch_pool_mkfs -m $profile -d $profile -b 1G >> $seqres.full 2>&1
> > > +	if [ $? -ne 0 ]; then
> > > +		echo "mkfs $mkfs_opts failed"
> > > +		return
> > > +	fi
> > > +
> > > +	# Disable compression, as compressed read repair is known to have problems
> > > +	_scratch_mount -o compress=no
> > > +
> > > +	# Fill some part of the fs first
> > > +	$XFS_IO_PROG -f -c "pwrite -S 0xfe 0 400M" $SCRATCH_MNT/garbage > /dev/null 2>&1
> > > +
> > > +	# Then use fsstress to generate some extra contents.
> > > +	# Disable setattr related operations, as it may set NODATACOW which will
> > > +	# not allow us to use btrfs checksum to verify the content.
> > > +	$FSSTRESS_PROG -f setattr=0 -d $SCRATCH_MNT -w -n 3000 > /dev/null 2>&1
> > > +	sync
> > > +
> > > +	# Save the fssum of this fs
> > > +	$FSSUM_PROG -A -f -w $tmp.saved_fssum $SCRATCH_MNT
> > > +	$BTRFS_UTIL_PROG fi show $SCRATCH_MNT >> $seqres.full
> > > +	_scratch_unmount
> > > +}
> > > +
> > > +workload()
> > > +{
> > > +	local target=$(echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $1}')
> > > +	local profile=$1
> > > +	local num_devs=$2
> > > +
> > > +	_scratch_dev_pool_get $num_devs
> > > +	echo "=== Testing profile $profile ===" >> $seqres.full
> > > +	rm -f -- $tmp.saved_fssum
> > > +	prepare_fs $profile
> > > +
> > > +	# Corrupt the target device, only keep the superblock.
> > > +	$XFS_IO_PROG -c "pwrite 1M 1023M" $target > /dev/null 2>&1
> > > +
> > > +	_scratch_mount
> > > +
> > > +	# All content should be fine
> > > +	$FSSUM_PROG -r $tmp.saved_fssum $SCRATCH_MNT > /dev/null
> > > +
> > > +	# Scrub to fix the fs, this is known to report various correctable
> > > +	# errors
> > > +	$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full 2>&1
> > > +
> > > +	# Make sure above scrub fixed the fs
> > > +	$BTRFS_UTIL_PROG scrub start -Br $SCRATCH_MNT >> $seqres.full
> > > +	if [ $? -ne 0 ]; then
> > > +		echo "scrub failed to fix the fs for profile $profile"
> > > +	fi
> > > +	_scratch_unmount
> > > +	_scratch_dev_pool_put
> > > +}
> > > +
> > > +workload raid1 2
> > > +workload raid1c3 3
> > > +workload raid1c4 4
> > > +workload raid10 4
> > > +workload raid5 3
> > > +workload raid6 4
> > 
> > Speaking of 027,
> > 
> > Can you implement this with _btrfs_profile_configs?
> 
> In fact _btrfs_profile_configs() is what I went initially, but the
> following factors prevent it from being used:
> 
> - No way to specify the number of devices
>   That's not a big deal though.
> 
> - No way to skip unsupported profiles
>   Like DUP/SINGLE can not meet the duplication requirement.

It takes an argument and if you pass "replace-missing" I think it does
what you want. You could also add another string it accepts for
different semantics.

> 
> 
> And since _btrfs_profiles_configs() directly provides the mkfs options,
> it's not easy to just pick what we need.
> 
> Personally speaking, it would be much easier if we make
> _btrfs_profiles_configs() to just return a profile, not full mkfs options.

Giving the caller more flexibility with their call to mkfs does seem
like a good idea to me.

> 
> > 
> > Further, you could imagine writing a more generic test that does:
> > for raid in raids:
> >          create-fs raid
> >          screw-up disk(s)
> >          check-condition
> 
> I have seen some helper in `common/populate`, but unfortunately it only
> supports XFS and EXT4.
> 
> It may be a good idea to enhance that file though.

Cool, never noticed that before. Would be interesting to make it work
with btrfs.

> 
> Thanks,
> Qu
> 
> > 
> > and make 027 and this new one (and others?) special cases of that.
> > I think this might fall under YAGNI.. Food for thought :)
> > 
> > > +
> > > +echo "Silence is golden"
> > > +
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/btrfs/261.out b/tests/btrfs/261.out
> > > new file mode 100644
> > > index 00000000..679ddc0f
> > > --- /dev/null
> > > +++ b/tests/btrfs/261.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 261
> > > +Silence is golden
> > > --
> > > 2.36.1
> > > 



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux