Re: [PATCH] generic: add test for missing btrfs csums in log when doing async on subpage vol

Mark Harmstone <maharmstone@xxxxxxxx> · Tue, 29 Oct 2024 10:03:36 +0000

On 29/10/24 05:11, Zorro Lang wrote:
> On Mon, Oct 28, 2024 at 02:57:28PM -0700, Darrick J. Wong wrote:
>> On Tue, Oct 15, 2024 at 04:39:34PM +0100, Mark Harmstone wrote:
>>> Adds a test for a bug we encountered on Linux 6.4 on aarch64, where a
>>> race could mean that csums weren't getting written to the log tree,
>>> leading to corruption when it was replayed.
>>>
>>> The patches to detect log this tree corruption are in btrfs-progs 6.11.
>>>
>>> Signed-off-by: Mark Harmstone <maharmstone@xxxxxx>
>>> ---
>>> This is a genericized version of the test I originally proposed as
>>> btrfs/333.
>>>
>>>   tests/generic/757     | 71 +++++++++++++++++++++++++++++++++++++++++++
>>>   tests/generic/757.out |  2 ++
>>>   2 files changed, 73 insertions(+)
>>>   create mode 100755 tests/generic/757
>>>   create mode 100644 tests/generic/757.out
>>>
>>> diff --git a/tests/generic/757 b/tests/generic/757
>>> new file mode 100755
>>> index 00000000..6ad3d01e
>>> --- /dev/null
>>> +++ b/tests/generic/757
>>> @@ -0,0 +1,71 @@
>>> +#! /bin/bash
>>> +# SPDX-License-Identifier: GPL-2.0
>>> +#
>>> +# FS QA Test 757
>>> +#
>>> +# Test async dio with fsync to test a btrfs bug where a race meant that csums
>>> +# weren't getting written to the log tree, causing corruptions on remount.
>>> +# This can be seen on subpage FSes on Linux 6.4.
>>> +#
>>> +. ./common/preamble
>>> +_begin_fstest auto quick metadata log recoveryloop
>>> +
>>> +_fixed_by_kernel_commit e917ff56c8e7 \
>>> +	"btrfs: determine synchronous writers from bio or writeback control"
>>> +
>>> +fio_config=$tmp.fio
>>> +
>>> +. ./common/dmlogwrites
>>> +
>>> +_require_scratch
>>> +_require_log_writes
>>> +
>>> +cat >$fio_config <<EOF
>>> +[global]
>>> +iodepth=128
>>> +direct=1
>>> +ioengine=libaio
>>> +rw=randwrite
>>> +runtime=1s
>>> +[job0]
>>> +rw=randwrite
>>> +filename=$SCRATCH_MNT/file
>>> +size=1g
>>> +fdatasync=1
>>> +EOF
>>> +
>>> +_require_fio $fio_config
>>> +
>>> +cat $fio_config >> $seqres.full
>>> +
>>> +_log_writes_init $SCRATCH_DEV
>>> +_log_writes_mkfs >> $seqres.full 2>&1
>>> +_log_writes_mark mkfs
>>> +
>>> +_log_writes_mount
>>> +
>>> +$FIO_PROG $fio_config > /dev/null 2>&1
>>> +_log_writes_unmount
>>> +
>>> +_log_writes_remove
>>> +
>>> +prev=$(_log_writes_mark_to_entry_number mkfs)
>>> +[ -z "$prev" ] && _fail "failed to locate entry mark 'mkfs'"
>>> +cur=$(_log_writes_find_next_fua $prev)
>>> +[ -z "$cur" ] && _fail "failed to locate next FUA write"
>>> +
>>> +while [ ! -z "$cur" ]; do
>>> +	_log_writes_replay_log_range $cur $SCRATCH_DEV >> $seqres.full
>>> +
>>> +	_check_scratch_fs
>>
>> This test fails on xfs because (afaict) replaying the log to $cur
>> results in $SCRATCH_DEV being a filesystem with a dirty log; and
>> xfs_repair fails when it is given a filesystem with a dirty log.
>>
>> I then fixed the test to mount and unmount the filesystem to recovery
>> the dirty log before invoking xfs_repair:
>>
>> 	# xfs_repair won't run if the log is dirty
>> 	if [ $FSTYP = "xfs" ]; then
>> 		_scratch_mount
>> 		_scratch_unmount
>> 	fi
> 
> Thanks Darrick, you're right.
> I'm wondering can we always do a mount&unmount at here, no matter the
> $FSTYP, if that doesn't affect the testing of other filesystems?
> 
>> 	_check_scratch_fs
>>
>> But now the test takes a very long time to run because (on my system
>> anyway) the fio run can initiate 17,000 FUAs, which means that this loop
>> runs that many times.  100 iterations takes about 45 seconds, which is
>> about two hours.
>>
>> Is it necessary to iterate the loop that many times to reproduce
>> whatever issue btrfs had?
> 
> Yes, it takes much long time on my side too:
>   FSTYP         -- ext4
>   PLATFORM      -- Linux/x86_64 dell-per750-47 6.12.0-rc4+ #1 SMP PREEMPT_DYNAMIC Fri Oct 25 14:25:45 EDT 2024
>   MKFS_OPTIONS  -- -F /dev/sda4
>   MOUNT_OPTIONS -- -o acl,user_xattr -o context=system_u:object_r:root_t:s0 /dev/sda4 /mnt/xfstests/scratch
> 
>   generic/757        4247s
>   Ran: generic/757
>   Passed all 1 tests
> 
> So better to reduce the testing time as much as possible, and remove it
> from the "quick" group. (Maybe we can have a tag to mark those cases need
> much long time too).

Or maybe it should be made btrfs-specific again? That's how I originally 
wrote it, but Filipe Manana thought it ought to be genericized.

An advantage of making it FS-specific is that you can use the --fsck 
option in log-writes, which is about 10 times quicker than doing the 
loop in bash.

Mark