[PATCH] fstests: add fsstress + compaction test

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



Running compaction while we run fsstress can crash older kernels as per
korg#218227 [0], the fix for that [0] has been posted [1] but that patch
is not yet on v6.9-rc4 and the patch requires changes for v6.9.

Today I find that v6.9-rc4 is also hitting an unrecoverable hung task
between compaction and fsstress while running generic/476 on the
following kdevops test sections [2]:

  * xfs_nocrc
  * xfs_nocrc_2k
  * xfs_nocrc_4k

Analyzing the trace I see the guest uses loopback block devices for the
fstests TEST_DEV, the loopback file uses sparsefiles on a btrfs
partition. The contention based on traces [3] [4] seems to be that we
have somehow have fsstress + compaction race on folio_wait_bit_common().

We have this happening:

  a) kthread compaction --> migrate_pages_batch()
                --> folio_wait_bit_common()
  b) workqueue on btrfs writeback wb_workfn  --> extent_write_cache_pages()
                --> folio_wait_bit_common()
  c) workqueue on loopback loop_rootcg_workfn() --> filemap_fdatawrite_wbc()
                --> folio_wait_bit_common()
  d) kthread xfsaild --> blk_mq_submit_bio() --> wbt_wait()

I tried to reproduce but couldn't easily do so, so I wrote this test
to help, and with this I have 100% failure rate so far out of 2 runs.

Given we also have korg#218227 and that patch likely needing
backporting, folks will want a reproducer for this issue. This should
hopefully help with that case and this new separate issue.

To reproduce with kdevops just:

make defconfig-xfs_nocrc_2k  -j $(nproc)
make -j $(nproc)
make fstests
make linux
make fstests-baseline TESTS=generic/733
tail -f guestfs/*-xfs-nocrc-2k/console.log

[0] https://bugzilla.kernel.org/show_bug.cgi?id=218227
[1] https://lore.kernel.org/all/7ee2bb8c-441a-418b-ba3a-d305f69d31c8@xxxxxxx/T/#u
[2] https://github.com/linux-kdevops/kdevops/blob/main/playbooks/roles/fstests/templates/xfs/xfs.config
[3] https://gist.github.com/mcgrof/4dfa3264f513ce6ca398414326cfab84
[4] https://gist.github.com/mcgrof/f40a9f31a43793dac928ce287cfacfeb

Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
---

Note: kdevops uses its own fork of fstests which has this merged
already, so the above should just work. If it's your first time using
kdevops be sure to just read the README for the first time users:

https://github.com/linux-kdevops/kdevops/blob/main/docs/kdevops-first-run.md

 common/rc             |  7 ++++++
 tests/generic/744     | 56 +++++++++++++++++++++++++++++++++++++++++++
 tests/generic/744.out |  2 ++
 3 files changed, 65 insertions(+)
 create mode 100755 tests/generic/744
 create mode 100644 tests/generic/744.out

diff --git a/common/rc b/common/rc
index b7b77ac1b46d..d4432f5ce259 100644
--- a/common/rc
+++ b/common/rc
@@ -120,6 +120,13 @@ _require_hugepages()
 		_notrun "Kernel does not report huge page size"
 }
 
+# Requires CONFIG_COMPACTION
+_require_compaction()
+{
+	if [ ! -f /proc/sys/vm/compact_memory ]; then
+	    _notrun "Need compaction enabled CONFIG_COMPACTION=y"
+	fi
+}
 # Get hugepagesize in bytes
 _get_hugepagesize()
 {
diff --git a/tests/generic/744 b/tests/generic/744
new file mode 100755
index 000000000000..2b3c0c7e92fb
--- /dev/null
+++ b/tests/generic/744
@@ -0,0 +1,56 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2024 Luis Chamberlain.  All Rights Reserved.
+#
+# FS QA Test 744
+#
+# fsstress + compaction test
+#
+. ./common/preamble
+_begin_fstest auto rw long_rw stress soak smoketest
+
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
+}
+
+# Import common functions.
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs generic
+
+_require_scratch
+_require_compaction
+_require_command "$KILLALL_PROG" "killall"
+
+echo "Silence is golden."
+
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+nr_cpus=$((LOAD_FACTOR * 4))
+nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
+fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)
+
+# start a background getxattr loop for the existing xattr
+runfile="$tmp.getfattr"
+touch $runfile
+while [ -e $runfile ]; do
+	echo 1 > /proc/sys/vm/compact_memory
+	sleep 15
+done &
+getfattr_pid=$!
+
+test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
+
+$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full
+
+rm -f $runfile
+wait > /dev/null 2>&1
+
+status=0
+exit
diff --git a/tests/generic/744.out b/tests/generic/744.out
new file mode 100644
index 000000000000..205c684fa995
--- /dev/null
+++ b/tests/generic/744.out
@@ -0,0 +1,2 @@
+QA output created by 744
+Silence is golden
-- 
2.43.0





[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux