On Tue, Oct 15, 2024 at 04:39:34PM +0100, Mark Harmstone wrote: > Adds a test for a bug we encountered on Linux 6.4 on aarch64, where a > race could mean that csums weren't getting written to the log tree, > leading to corruption when it was replayed. > > The patches to detect log this tree corruption are in btrfs-progs 6.11. > > Signed-off-by: Mark Harmstone <maharmstone@xxxxxx> > --- > This is a genericized version of the test I originally proposed as > btrfs/333. > > tests/generic/757 | 71 +++++++++++++++++++++++++++++++++++++++++++ > tests/generic/757.out | 2 ++ > 2 files changed, 73 insertions(+) > create mode 100755 tests/generic/757 > create mode 100644 tests/generic/757.out > > diff --git a/tests/generic/757 b/tests/generic/757 > new file mode 100755 > index 00000000..6ad3d01e > --- /dev/null > +++ b/tests/generic/757 > @@ -0,0 +1,71 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# > +# FS QA Test 757 > +# > +# Test async dio with fsync to test a btrfs bug where a race meant that csums > +# weren't getting written to the log tree, causing corruptions on remount. > +# This can be seen on subpage FSes on Linux 6.4. > +# > +. ./common/preamble > +_begin_fstest auto quick metadata log recoveryloop > + > +_fixed_by_kernel_commit e917ff56c8e7 \ > + "btrfs: determine synchronous writers from bio or writeback control" > + > +fio_config=$tmp.fio > + > +. ./common/dmlogwrites > + > +_require_scratch > +_require_log_writes > + > +cat >$fio_config <<EOF > +[global] > +iodepth=128 > +direct=1 > +ioengine=libaio > +rw=randwrite > +runtime=1s > +[job0] > +rw=randwrite > +filename=$SCRATCH_MNT/file > +size=1g > +fdatasync=1 > +EOF > + > +_require_fio $fio_config > + > +cat $fio_config >> $seqres.full > + > +_log_writes_init $SCRATCH_DEV > +_log_writes_mkfs >> $seqres.full 2>&1 > +_log_writes_mark mkfs > + > +_log_writes_mount > + > +$FIO_PROG $fio_config > /dev/null 2>&1 > +_log_writes_unmount > + > +_log_writes_remove > + > +prev=$(_log_writes_mark_to_entry_number mkfs) > +[ -z "$prev" ] && _fail "failed to locate entry mark 'mkfs'" > +cur=$(_log_writes_find_next_fua $prev) > +[ -z "$cur" ] && _fail "failed to locate next FUA write" > + > +while [ ! -z "$cur" ]; do > + _log_writes_replay_log_range $cur $SCRATCH_DEV >> $seqres.full > + > + _check_scratch_fs This test fails on xfs because (afaict) replaying the log to $cur results in $SCRATCH_DEV being a filesystem with a dirty log; and xfs_repair fails when it is given a filesystem with a dirty log. I then fixed the test to mount and unmount the filesystem to recovery the dirty log before invoking xfs_repair: # xfs_repair won't run if the log is dirty if [ $FSTYP = "xfs" ]; then _scratch_mount _scratch_unmount fi _check_scratch_fs But now the test takes a very long time to run because (on my system anyway) the fio run can initiate 17,000 FUAs, which means that this loop runs that many times. 100 iterations takes about 45 seconds, which is about two hours. Is it necessary to iterate the loop that many times to reproduce whatever issue btrfs had? --D > + > + prev=$cur > + cur=$(_log_writes_find_next_fua $(($cur + 1))) > + [ -z "$cur" ] && break > +done > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/generic/757.out b/tests/generic/757.out > new file mode 100644 > index 00000000..dfbc8094 > --- /dev/null > +++ b/tests/generic/757.out > @@ -0,0 +1,2 @@ > +QA output created by 757 > +Silence is golden > -- > 2.44.2 > >