From: Johannes Thumshirn <johannes.thumshirn@xxxxxxx> Recently we had a bug report about a kernel crash that happened when the user was converting a filesystem to use RAID1 for metadata, but for some reason the device's write pointers got out of sync. Test this scenario by manually injecting de-synchronized write pointer positions and then running conversion to a metadata RAID1 filesystem. In the testcase also repair the broken filesystem and check if both system and metadata block groups are back to the default 'DUP' profile afterwards. Link: https://lore.kernel.org/linux-btrfs/CAB_b4sBhDe3tscz=duVyhc9hNE+gu=B8CrgLO152uMyanR8BEA@xxxxxxxxxxxxxx/ Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx> Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx> --- Changes to v3: - Limit number of dirtied zones to 64 Changes to v2: - Filter SCRATCH_MNT in golden output Changes to v1: - Add test description - Don't redirect stderr to $seqres.full - Use xfs_io instead of dd - Use $SCRATCH_MNT instead of hardcoded mount path - Check that 1st balance command actually fails as it's supposed to --- tests/btrfs/329 | 68 +++++++++++++++++++++++++++++++++++++++++++++ tests/btrfs/329.out | 7 +++++ 2 files changed, 75 insertions(+) create mode 100755 tests/btrfs/329 create mode 100644 tests/btrfs/329.out diff --git a/tests/btrfs/329 b/tests/btrfs/329 new file mode 100755 index 000000000000..24d34852db1f --- /dev/null +++ b/tests/btrfs/329 @@ -0,0 +1,68 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2025 Western Digital Corporation. All Rights Reserved. +# +# FS QA Test 329 +# +# Regression test for a kernel crash when converting a zoned BTRFS from +# metadata DUP to RAID1 and one of the devices has a non 0 write pointer +# position in the target zone. +# +. ./common/preamble +_begin_fstest zone quick volume + +. ./common/filter + +_fixed_by_kernel_commit XXXXXXXXXXXX \ + "btrfs: zoned: return EIO on RAID1 block group write pointer mismatch" + +_require_scratch_dev_pool 2 +declare -a devs="( $SCRATCH_DEV_POOL )" +_require_zoned_device ${devs[0]} +_require_zoned_device ${devs[1]} +_require_command "$BLKZONE_PROG" blkzone + +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed" +_scratch_mount + +# Write some data to the FS to dirty it +$XFS_IO_PROG -fc "pwrite 0 128M" $SCRATCH_MNT/test | _filter_xfs_io + +# Add device two to the FS +$BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full + +# Move write pointers of all empty zones by 4k to simulate write pointer +# mismatch. + +nzones=$($BLKZONE_PROG report ${devs[1]} | wc -l) +if [ $nzones -gt 64 ]; then + nzones=64 +fi + +zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\ + sed 's/,//' | head -n $nzones) +for zone in $zones; +do + # We have to ignore the output here, as a) we don't know the number of + # zones that have dirtied and b) if we run over the maximal number of + # active zones, xfs_io will output errors, both we don't care. + $XFS_IO_PROG -fdc "pwrite $(($zone << 9)) 4096" ${devs[1]} > /dev/null 2>&1 +done + +# expected to fail +$BTRFS_UTIL_PROG balance start -mconvert=raid1 $SCRATCH_MNT 2>&1 |\ + _filter_scratch + +_scratch_unmount + +$MOUNT_PROG -t btrfs -odegraded ${devs[0]} $SCRATCH_MNT + +$BTRFS_UTIL_PROG device remove --force missing $SCRATCH_MNT >> $seqres.full +$BTRFS_UTIL_PROG balance start --full-balance $SCRATCH_MNT >> $seqres.full + +# Check that both System and Metadata are back to the DUP profile +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT |\ + grep -o -e "System, DUP" -e "Metadata, DUP" + +status=0 +exit diff --git a/tests/btrfs/329.out b/tests/btrfs/329.out new file mode 100644 index 000000000000..e47a2a6ff04b --- /dev/null +++ b/tests/btrfs/329.out @@ -0,0 +1,7 @@ +QA output created by 329 +wrote 134217728/134217728 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +ERROR: error during balancing 'SCRATCH_MNT': Input/output error +There may be more info in syslog - try dmesg | tail +System, DUP +Metadata, DUP -- 2.43.0