Re: [PATCH v3] fstests: btrfs: zoned: verify RAID conversion with write pointer mismatch

Naohiro Aota <Naohiro.Aota@xxxxxxx> · Wed, 19 Mar 2025 01:18:54 +0000

On Tue Mar 18, 2025 at 10:17 PM JST, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
>
> Recently we had a bug report about a kernel crash that happened when the
> user was converting a filesystem to use RAID1 for metadata, but for some
> reason the device's write pointers got out of sync.
>
> Test this scenario by manually injecting de-synchronized write pointer
> positions and then running conversion to a metadata RAID1 filesystem.
>
> In the testcase also repair the broken filesystem and check if both system
> and metadata block groups are back to the default 'DUP' profile
> afterwards.
>
> Link: https://lore.kernel.org/linux-btrfs/CAB_b4sBhDe3tscz=duVyhc9hNE+gu=B8CrgLO152uMyanR8BEA@xxxxxxxxxxxxxx/
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
>
> ---
> Changes to v2:
> - Filter SCRATCH_MNT in golden output
> Changes to v1:
> - Add test description
> - Don't redirect stderr to $seqres.full
> - Use xfs_io instead of dd
> - Use $SCRATCH_MNT instead of hardcoded mount path
> - Check that 1st balance command actually fails as it's supposed to
> ---
>  tests/btrfs/329     | 62 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/329.out |  7 +++++
>  2 files changed, 69 insertions(+)
>  create mode 100755 tests/btrfs/329
>  create mode 100644 tests/btrfs/329.out
>
> diff --git a/tests/btrfs/329 b/tests/btrfs/329
> new file mode 100755
> index 000000000000..5496866ac325
> --- /dev/null
> +++ b/tests/btrfs/329
> @@ -0,0 +1,62 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2025 Western Digital Corporation.  All Rights Reserved.
> +#
> +# FS QA Test 329
> +#
> +# Regression test for a kernel crash when converting a zoned BTRFS from
> +# metadata DUP to RAID1 and one of the devices has a non 0 write pointer
> +# position in the target zone.
> +#
> +. ./common/preamble
> +_begin_fstest zone quick volume
> +
> +. ./common/filter
> +
> +_fixed_by_kernel_commit XXXXXXXXXXXX \
> +	"btrfs: zoned: return EIO on RAID1 block group write pointer mismatch"
> +
> +_require_scratch_dev_pool 2
> +declare -a devs="( $SCRATCH_DEV_POOL )"
> +_require_zoned_device ${devs[0]}
> +_require_zoned_device ${devs[1]}
> +_require_command "$BLKZONE_PROG" blkzone
> +
> +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
> +_scratch_mount
> +
> +# Write some data to the FS to dirty it
> +$XFS_IO_PROG -fc "pwrite 0 128M" $SCRATCH_MNT/test | _filter_xfs_io
> +
> +# Add device two to the FS
> +$BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full
> +
> +# Move write pointers of all empty zones by 4k to simulate write pointer
> +# mismatch.
> +zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\
> +	sed 's/,//')

Can we limit the number of zones to work with, in case we run this test
on a huge device? I guess 2*(128M/4M)=64 would be enough.

> +for zone in $zones;
> +do
> +	# We have to ignore the output here, as a) we don't know the number of
> +	# zones that have dirtied and b) if we run over the maximal number of
> +	# active zones, xfs_io will output errors, both we don't care.
> +	$XFS_IO_PROG -fdc "pwrite $(($zone << 9)) 4096" ${devs[1]} > /dev/null 2>&1
> +done
> +
> +# expected to fail
> +$BTRFS_UTIL_PROG balance start -mconvert=raid1 $SCRATCH_MNT 2>&1 |\
> +	_filter_scratch
> +
> +_scratch_unmount
> +
> +$MOUNT_PROG -t btrfs -odegraded ${devs[0]} $SCRATCH_MNT
> +
> +$BTRFS_UTIL_PROG device remove --force missing $SCRATCH_MNT >> $seqres.full
> +$BTRFS_UTIL_PROG balance start --full-balance $SCRATCH_MNT >> $seqres.full
> +
> +# Check that both System and Metadata are back to the DUP profile
> +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT |\
> +	grep -o -e "System, DUP" -e "Metadata, DUP"
> +
> +status=0
> +exit
> diff --git a/tests/btrfs/329.out b/tests/btrfs/329.out
> new file mode 100644
> index 000000000000..e47a2a6ff04b
> --- /dev/null
> +++ b/tests/btrfs/329.out
> @@ -0,0 +1,7 @@
> +QA output created by 329
> +wrote 134217728/134217728 bytes at offset 0
> +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +ERROR: error during balancing 'SCRATCH_MNT': Input/output error
> +There may be more info in syslog - try dmesg | tail
> +System, DUP
> +Metadata, DUP