On Mon, Oct 28, 2024 at 02:57:28PM -0700, Darrick J. Wong wrote: > On Tue, Oct 15, 2024 at 04:39:34PM +0100, Mark Harmstone wrote: > > Adds a test for a bug we encountered on Linux 6.4 on aarch64, where a > > race could mean that csums weren't getting written to the log tree, > > leading to corruption when it was replayed. > > > > The patches to detect log this tree corruption are in btrfs-progs 6.11. > > > > Signed-off-by: Mark Harmstone <maharmstone@xxxxxx> > > --- > > This is a genericized version of the test I originally proposed as > > btrfs/333. > > > > tests/generic/757 | 71 +++++++++++++++++++++++++++++++++++++++++++ > > tests/generic/757.out | 2 ++ > > 2 files changed, 73 insertions(+) > > create mode 100755 tests/generic/757 > > create mode 100644 tests/generic/757.out > > > > diff --git a/tests/generic/757 b/tests/generic/757 > > new file mode 100755 > > index 00000000..6ad3d01e > > --- /dev/null > > +++ b/tests/generic/757 > > @@ -0,0 +1,71 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# > > +# FS QA Test 757 > > +# > > +# Test async dio with fsync to test a btrfs bug where a race meant that csums > > +# weren't getting written to the log tree, causing corruptions on remount. > > +# This can be seen on subpage FSes on Linux 6.4. > > +# > > +. ./common/preamble > > +_begin_fstest auto quick metadata log recoveryloop > > + > > +_fixed_by_kernel_commit e917ff56c8e7 \ > > + "btrfs: determine synchronous writers from bio or writeback control" > > + > > +fio_config=$tmp.fio > > + > > +. ./common/dmlogwrites > > + > > +_require_scratch > > +_require_log_writes > > + > > +cat >$fio_config <<EOF > > +[global] > > +iodepth=128 > > +direct=1 > > +ioengine=libaio > > +rw=randwrite > > +runtime=1s > > +[job0] > > +rw=randwrite > > +filename=$SCRATCH_MNT/file > > +size=1g > > +fdatasync=1 > > +EOF > > + > > +_require_fio $fio_config > > + > > +cat $fio_config >> $seqres.full > > + > > +_log_writes_init $SCRATCH_DEV > > +_log_writes_mkfs >> $seqres.full 2>&1 > > +_log_writes_mark mkfs > > + > > +_log_writes_mount > > + > > +$FIO_PROG $fio_config > /dev/null 2>&1 > > +_log_writes_unmount > > + > > +_log_writes_remove > > + > > +prev=$(_log_writes_mark_to_entry_number mkfs) > > +[ -z "$prev" ] && _fail "failed to locate entry mark 'mkfs'" > > +cur=$(_log_writes_find_next_fua $prev) > > +[ -z "$cur" ] && _fail "failed to locate next FUA write" > > + > > +while [ ! -z "$cur" ]; do > > + _log_writes_replay_log_range $cur $SCRATCH_DEV >> $seqres.full > > + > > + _check_scratch_fs > > This test fails on xfs because (afaict) replaying the log to $cur > results in $SCRATCH_DEV being a filesystem with a dirty log; and > xfs_repair fails when it is given a filesystem with a dirty log. > > I then fixed the test to mount and unmount the filesystem to recovery > the dirty log before invoking xfs_repair: > > # xfs_repair won't run if the log is dirty > if [ $FSTYP = "xfs" ]; then > _scratch_mount > _scratch_unmount > fi Thanks Darrick, you're right. I'm wondering can we always do a mount&unmount at here, no matter the $FSTYP, if that doesn't affect the testing of other filesystems? > _check_scratch_fs > > But now the test takes a very long time to run because (on my system > anyway) the fio run can initiate 17,000 FUAs, which means that this loop > runs that many times. 100 iterations takes about 45 seconds, which is > about two hours. > > Is it necessary to iterate the loop that many times to reproduce > whatever issue btrfs had? Yes, it takes much long time on my side too: FSTYP -- ext4 PLATFORM -- Linux/x86_64 dell-per750-47 6.12.0-rc4+ #1 SMP PREEMPT_DYNAMIC Fri Oct 25 14:25:45 EDT 2024 MKFS_OPTIONS -- -F /dev/sda4 MOUNT_OPTIONS -- -o acl,user_xattr -o context=system_u:object_r:root_t:s0 /dev/sda4 /mnt/xfstests/scratch generic/757 4247s Ran: generic/757 Passed all 1 tests So better to reduce the testing time as much as possible, and remove it from the "quick" group. (Maybe we can have a tag to mark those cases need much long time too). This patch has been merged into for-next branch, as: cf97fa373 generic: add test for missing btrfs csums in log when doing async on subpage vol Please send another (or other two) patch to fix above 2 problems. Thanks, Zorro > > --D > > > + > > + prev=$cur > > + cur=$(_log_writes_find_next_fua $(($cur + 1))) > > + [ -z "$cur" ] && break > > +done > > + > > +echo "Silence is golden" > > + > > +# success, all done > > +status=0 > > +exit > > diff --git a/tests/generic/757.out b/tests/generic/757.out > > new file mode 100644 > > index 00000000..dfbc8094 > > --- /dev/null > > +++ b/tests/generic/757.out > > @@ -0,0 +1,2 @@ > > +QA output created by 757 > > +Silence is golden > > -- > > 2.44.2 > > > > >