On 19.03.25 02:18, Naohiro Aota wrote: > On Tue Mar 18, 2025 at 10:17 PM JST, Johannes Thumshirn wrote: >> From: Johannes Thumshirn <johannes.thumshirn@xxxxxxx> >> >> Recently we had a bug report about a kernel crash that happened when the >> user was converting a filesystem to use RAID1 for metadata, but for some >> reason the device's write pointers got out of sync. >> >> Test this scenario by manually injecting de-synchronized write pointer >> positions and then running conversion to a metadata RAID1 filesystem. >> >> In the testcase also repair the broken filesystem and check if both system >> and metadata block groups are back to the default 'DUP' profile >> afterwards. >> >> Link: https://lore.kernel.org/linux-btrfs/CAB_b4sBhDe3tscz=duVyhc9hNE+gu=B8CrgLO152uMyanR8BEA@xxxxxxxxxxxxxx/ >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx> >> >> --- >> Changes to v2: >> - Filter SCRATCH_MNT in golden output >> Changes to v1: >> - Add test description >> - Don't redirect stderr to $seqres.full >> - Use xfs_io instead of dd >> - Use $SCRATCH_MNT instead of hardcoded mount path >> - Check that 1st balance command actually fails as it's supposed to >> --- >> tests/btrfs/329 | 62 +++++++++++++++++++++++++++++++++++++++++++++ >> tests/btrfs/329.out | 7 +++++ >> 2 files changed, 69 insertions(+) >> create mode 100755 tests/btrfs/329 >> create mode 100644 tests/btrfs/329.out >> >> diff --git a/tests/btrfs/329 b/tests/btrfs/329 >> new file mode 100755 >> index 000000000000..5496866ac325 >> --- /dev/null >> +++ b/tests/btrfs/329 >> @@ -0,0 +1,62 @@ >> +#! /bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> +# Copyright (c) 2025 Western Digital Corporation. All Rights Reserved. >> +# >> +# FS QA Test 329 >> +# >> +# Regression test for a kernel crash when converting a zoned BTRFS from >> +# metadata DUP to RAID1 and one of the devices has a non 0 write pointer >> +# position in the target zone. >> +# >> +. ./common/preamble >> +_begin_fstest zone quick volume >> + >> +. ./common/filter >> + >> +_fixed_by_kernel_commit XXXXXXXXXXXX \ >> + "btrfs: zoned: return EIO on RAID1 block group write pointer mismatch" >> + >> +_require_scratch_dev_pool 2 >> +declare -a devs="( $SCRATCH_DEV_POOL )" >> +_require_zoned_device ${devs[0]} >> +_require_zoned_device ${devs[1]} >> +_require_command "$BLKZONE_PROG" blkzone >> + >> +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed" >> +_scratch_mount >> + >> +# Write some data to the FS to dirty it >> +$XFS_IO_PROG -fc "pwrite 0 128M" $SCRATCH_MNT/test | _filter_xfs_io >> + >> +# Add device two to the FS >> +$BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full >> + >> +# Move write pointers of all empty zones by 4k to simulate write pointer >> +# mismatch. >> +zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\ >> + sed 's/,//') > > Can we limit the number of zones to work with, in case we run this test > on a huge device? I guess 2*(128M/4M)=64 would be enough. > I.e. something like the following: diff --git a/tests/btrfs/329 b/tests/btrfs/329 index 5496866ac325..24d34852db1f 100755 --- a/tests/btrfs/329 +++ b/tests/btrfs/329 @@ -33,8 +33,14 @@ $BTRFS_UTIL_PROG device add ${devs[1]} $SCRATCH_MNT >> $seqres.full # Move write pointers of all empty zones by 4k to simulate write pointer # mismatch. + +nzones=$($BLKZONE_PROG report ${devs[1]} | wc -l) +if [ $nzones -gt 64 ]; then + nzones=64 +fi + zones=$($BLKZONE_PROG report ${devs[1]} | $AWK_PROG '/em/ { print $2 }' |\ - sed 's/,//') + sed 's/,//' | head -n $nzones) for zone in $zones; do Yup this still triggers the bug on an unpatched kernel in my case and the fix also fixes it. So yes I'll update the testcase (I guess Filipe's R-b remains with this change).