Re: [PATCH] fstests: add test case to make sure btrfs can handle one corrupted device

Qu Wenruo <quwenruo.btrfs@xxxxxxx> · Wed, 27 Jul 2022 09:39:46 +0800

On 2022/7/27 07:09, Boris Burkov wrote:
On Tue, Jul 26, 2022 at 02:29:48PM +0800, Qu Wenruo wrote:
The new test case will verify that btrfs can handle one corrupted device
without affecting the consistency of the filesystem.

Unlike a missing device, one corrupted device can return garbage to the fs,
thus btrfs has to utilize its data/metadata checksum to verify which
data is correct.


The test case will:

- Create a small fs
   Mostly to speedup the test

- Fill the fs with a regular file

- Use fsstress to create some contents

- Save the fssum for later verification

- Corrupt one device with garbage but keep the primary superblock
   untouched

- Run fssum verification

- Run scrub to fix the fs

- Run scrub again to make sure the fs is fine

Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>

Works for me, and looks like a nice test to complement btrfs/027.
Reviewed-by: Boris Burkov <boris@xxxxxx>

---
  tests/btrfs/261     | 94 +++++++++++++++++++++++++++++++++++++++++++++
  tests/btrfs/261.out |  2 +
  2 files changed, 96 insertions(+)
  create mode 100755 tests/btrfs/261
  create mode 100644 tests/btrfs/261.out

diff --git a/tests/btrfs/261 b/tests/btrfs/261
new file mode 100755
index 00000000..15218e28
--- /dev/null
+++ b/tests/btrfs/261
@@ -0,0 +1,94 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 261
+#
+# Make sure btrfs raid profiles can handling one corrupted device
+# without affecting the consistency of the fs.
+#
+. ./common/preamble
+_begin_fstest raid
+
+. ./common/filter
+. ./common/populate
+
+_supported_fs btrfs
+_require_scratch_dev_pool 4
+_require_fssum
+
+prepare_fs()
+{
+	local profile=$1
+
+	# We don't want too large fs which can take too long to populate
+	# And the extra redirection of stderr is to avoid the RAID56 warning
+	# message to polluate the golden output
+	_scratch_pool_mkfs -m $profile -d $profile -b 1G >> $seqres.full 2>&1
+	if [ $? -ne 0 ]; then
+		echo "mkfs $mkfs_opts failed"
+		return
+	fi
+
+	# Disable compression, as compressed read repair is known to have problems
+	_scratch_mount -o compress=no
+
+	# Fill some part of the fs first
+	$XFS_IO_PROG -f -c "pwrite -S 0xfe 0 400M" $SCRATCH_MNT/garbage > /dev/null 2>&1
+
+	# Then use fsstress to generate some extra contents.
+	# Disable setattr related operations, as it may set NODATACOW which will
+	# not allow us to use btrfs checksum to verify the content.
+	$FSSTRESS_PROG -f setattr=0 -d $SCRATCH_MNT -w -n 3000 > /dev/null 2>&1
+	sync
+
+	# Save the fssum of this fs
+	$FSSUM_PROG -A -f -w $tmp.saved_fssum $SCRATCH_MNT
+	$BTRFS_UTIL_PROG fi show $SCRATCH_MNT >> $seqres.full
+	_scratch_unmount
+}
+
+workload()
+{
+	local target=$(echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $1}')
+	local profile=$1
+	local num_devs=$2
+
+	_scratch_dev_pool_get $num_devs
+	echo "=== Testing profile $profile ===" >> $seqres.full
+	rm -f -- $tmp.saved_fssum
+	prepare_fs $profile
+
+	# Corrupt the target device, only keep the superblock.
+	$XFS_IO_PROG -c "pwrite 1M 1023M" $target > /dev/null 2>&1
+
+	_scratch_mount
+
+	# All content should be fine
+	$FSSUM_PROG -r $tmp.saved_fssum $SCRATCH_MNT > /dev/null
+
+	# Scrub to fix the fs, this is known to report various correctable
+	# errors
+	$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full 2>&1
+
+	# Make sure above scrub fixed the fs
+	$BTRFS_UTIL_PROG scrub start -Br $SCRATCH_MNT >> $seqres.full
+	if [ $? -ne 0 ]; then
+		echo "scrub failed to fix the fs for profile $profile"
+	fi
+	_scratch_unmount
+	_scratch_dev_pool_put
+}
+
+workload raid1 2
+workload raid1c3 3
+workload raid1c4 4
+workload raid10 4
+workload raid5 3
+workload raid6 4

Speaking of 027,

Can you implement this with _btrfs_profile_configs?

In fact _btrfs_profile_configs() is what I went initially, but the
following factors prevent it from being used:

- No way to specify the number of devices
  That's not a big deal though.

- No way to skip unsupported profiles
  Like DUP/SINGLE can not meet the duplication requirement.


And since _btrfs_profiles_configs() directly provides the mkfs options,
it's not easy to just pick what we need.

Personally speaking, it would be much easier if we make
_btrfs_profiles_configs() to just return a profile, not full mkfs options.


Further, you could imagine writing a more generic test that does:
for raid in raids:
         create-fs raid
         screw-up disk(s)
         check-condition

I have seen some helper in `common/populate`, but unfortunately it only
supports XFS and EXT4.

It may be a good idea to enhance that file though.

Thanks,
Qu


and make 027 and this new one (and others?) special cases of that.
I think this might fall under YAGNI.. Food for thought :)

+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/261.out b/tests/btrfs/261.out
new file mode 100644
index 00000000..679ddc0f
--- /dev/null
+++ b/tests/btrfs/261.out
@@ -0,0 +1,2 @@
+QA output created by 261
+Silence is golden
--
2.36.1