[PATCH] fstests: add test case to make sure btrfs can handle one corrupted device

Qu Wenruo <wqu@xxxxxxxx> · Tue, 26 Jul 2022 14:29:48 +0800

The new test case will verify that btrfs can handle one corrupted device
without affecting the consistency of the filesystem.

Unlike a missing device, one corrupted device can return garbage to the fs,
thus btrfs has to utilize its data/metadata checksum to verify which
data is correct.

The test case will:

- Create a small fs
  Mostly to speedup the test

- Fill the fs with a regular file

- Use fsstress to create some contents

- Save the fssum for later verification

- Corrupt one device with garbage but keep the primary superblock
  untouched

- Run fssum verification

- Run scrub to fix the fs

- Run scrub again to make sure the fs is fine

Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
---
 tests/btrfs/261     | 94 +++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/261.out |  2 +
 2 files changed, 96 insertions(+)
 create mode 100755 tests/btrfs/261
 create mode 100644 tests/btrfs/261.out

diff --git a/tests/btrfs/261 b/tests/btrfs/261
new file mode 100755
index 00000000..15218e28
--- /dev/null
+++ b/tests/btrfs/261
@@ -0,0 +1,94 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 261
+#
+# Make sure btrfs raid profiles can handling one corrupted device
+# without affecting the consistency of the fs.
+#
+. ./common/preamble
+_begin_fstest raid
+
+. ./common/filter
+. ./common/populate
+
+_supported_fs btrfs
+_require_scratch_dev_pool 4
+_require_fssum
+
+prepare_fs()
+{
+	local profile=$1
+
+	# We don't want too large fs which can take too long to populate
+	# And the extra redirection of stderr is to avoid the RAID56 warning
+	# message to polluate the golden output
+	_scratch_pool_mkfs -m $profile -d $profile -b 1G >> $seqres.full 2>&1
+	if [ $? -ne 0 ]; then
+		echo "mkfs $mkfs_opts failed"
+		return
+	fi
+
+	# Disable compression, as compressed read repair is known to have problems
+	_scratch_mount -o compress=no
+
+	# Fill some part of the fs first
+	$XFS_IO_PROG -f -c "pwrite -S 0xfe 0 400M" $SCRATCH_MNT/garbage > /dev/null 2>&1
+
+	# Then use fsstress to generate some extra contents.
+	# Disable setattr related operations, as it may set NODATACOW which will
+	# not allow us to use btrfs checksum to verify the content.
+	$FSSTRESS_PROG -f setattr=0 -d $SCRATCH_MNT -w -n 3000 > /dev/null 2>&1
+	sync
+
+	# Save the fssum of this fs
+	$FSSUM_PROG -A -f -w $tmp.saved_fssum $SCRATCH_MNT
+	$BTRFS_UTIL_PROG fi show $SCRATCH_MNT >> $seqres.full
+	_scratch_unmount
+}
+
+workload()
+{
+	local target=$(echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $1}')
+	local profile=$1
+	local num_devs=$2
+
+	_scratch_dev_pool_get $num_devs
+	echo "=== Testing profile $profile ===" >> $seqres.full
+	rm -f -- $tmp.saved_fssum
+	prepare_fs $profile
+
+	# Corrupt the target device, only keep the superblock.
+	$XFS_IO_PROG -c "pwrite 1M 1023M" $target > /dev/null 2>&1
+
+	_scratch_mount
+
+	# All content should be fine
+	$FSSUM_PROG -r $tmp.saved_fssum $SCRATCH_MNT > /dev/null
+
+	# Scrub to fix the fs, this is known to report various correctable
+	# errors
+	$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full 2>&1
+
+	# Make sure above scrub fixed the fs
+	$BTRFS_UTIL_PROG scrub start -Br $SCRATCH_MNT >> $seqres.full
+	if [ $? -ne 0 ]; then
+		echo "scrub failed to fix the fs for profile $profile"
+	fi
+	_scratch_unmount
+	_scratch_dev_pool_put
+}
+
+workload raid1 2
+workload raid1c3 3
+workload raid1c4 4
+workload raid10 4
+workload raid5 3
+workload raid6 4
+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/261.out b/tests/btrfs/261.out
new file mode 100644
index 00000000..679ddc0f
--- /dev/null
+++ b/tests/btrfs/261.out
@@ -0,0 +1,2 @@
+QA output created by 261
+Silence is golden
-- 
2.36.1