Running compaction while we run fsstress can crash older kernels as per korg#218227 [0], the fix for that [0] has been posted [1] but that patch is not yet on v6.9-rc4 and the patch requires changes for v6.9. Today I find that v6.9-rc4 is also hitting an unrecoverable hung task between compaction and fsstress while running generic/476 on the following kdevops test sections [2]: * xfs_nocrc * xfs_nocrc_2k * xfs_nocrc_4k Analyzing the trace I see the guest uses loopback block devices for the fstests TEST_DEV, the loopback file uses sparsefiles on a btrfs partition. The contention based on traces [3] [4] seems to be that we have somehow have fsstress + compaction race on folio_wait_bit_common(). We have this happening: a) kthread compaction --> migrate_pages_batch() --> folio_wait_bit_common() b) workqueue on btrfs writeback wb_workfn --> extent_write_cache_pages() --> folio_wait_bit_common() c) workqueue on loopback loop_rootcg_workfn() --> filemap_fdatawrite_wbc() --> folio_wait_bit_common() d) kthread xfsaild --> blk_mq_submit_bio() --> wbt_wait() I tried to reproduce but couldn't easily do so, so I wrote this test to help, and with this I have 100% failure rate so far out of 2 runs. Given we also have korg#218227 and that patch likely needing backporting, folks will want a reproducer for this issue. This should hopefully help with that case and this new separate issue. To reproduce with kdevops just: make defconfig-xfs_nocrc_2k -j $(nproc) make -j $(nproc) make fstests make linux make fstests-baseline TESTS=generic/733 tail -f guestfs/*-xfs-nocrc-2k/console.log [0] https://bugzilla.kernel.org/show_bug.cgi?id=218227 [1] https://lore.kernel.org/all/7ee2bb8c-441a-418b-ba3a-d305f69d31c8@xxxxxxx/T/#u [2] https://github.com/linux-kdevops/kdevops/blob/main/playbooks/roles/fstests/templates/xfs/xfs.config [3] https://gist.github.com/mcgrof/4dfa3264f513ce6ca398414326cfab84 [4] https://gist.github.com/mcgrof/f40a9f31a43793dac928ce287cfacfeb Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> --- Note: kdevops uses its own fork of fstests which has this merged already, so the above should just work. If it's your first time using kdevops be sure to just read the README for the first time users: https://github.com/linux-kdevops/kdevops/blob/main/docs/kdevops-first-run.md common/rc | 7 ++++++ tests/generic/744 | 56 +++++++++++++++++++++++++++++++++++++++++++ tests/generic/744.out | 2 ++ 3 files changed, 65 insertions(+) create mode 100755 tests/generic/744 create mode 100644 tests/generic/744.out diff --git a/common/rc b/common/rc index b7b77ac1b46d..d4432f5ce259 100644 --- a/common/rc +++ b/common/rc @@ -120,6 +120,13 @@ _require_hugepages() _notrun "Kernel does not report huge page size" } +# Requires CONFIG_COMPACTION +_require_compaction() +{ + if [ ! -f /proc/sys/vm/compact_memory ]; then + _notrun "Need compaction enabled CONFIG_COMPACTION=y" + fi +} # Get hugepagesize in bytes _get_hugepagesize() { diff --git a/tests/generic/744 b/tests/generic/744 new file mode 100755 index 000000000000..2b3c0c7e92fb --- /dev/null +++ b/tests/generic/744 @@ -0,0 +1,56 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2024 Luis Chamberlain. All Rights Reserved. +# +# FS QA Test 744 +# +# fsstress + compaction test +# +. ./common/preamble +_begin_fstest auto rw long_rw stress soak smoketest + +_cleanup() +{ + cd / + rm -f $tmp.* + $KILLALL_PROG -9 fsstress > /dev/null 2>&1 +} + +# Import common functions. + +# real QA test starts here + +# Modify as appropriate. +_supported_fs generic + +_require_scratch +_require_compaction +_require_command "$KILLALL_PROG" "killall" + +echo "Silence is golden." + +_scratch_mkfs > $seqres.full 2>&1 +_scratch_mount >> $seqres.full 2>&1 + +nr_cpus=$((LOAD_FACTOR * 4)) +nr_ops=$((25000 * nr_cpus * TIME_FACTOR)) +fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus) + +# start a background getxattr loop for the existing xattr +runfile="$tmp.getfattr" +touch $runfile +while [ -e $runfile ]; do + echo 1 > /proc/sys/vm/compact_memory + sleep 15 +done & +getfattr_pid=$! + +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION") + +$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full + +rm -f $runfile +wait > /dev/null 2>&1 + +status=0 +exit diff --git a/tests/generic/744.out b/tests/generic/744.out new file mode 100644 index 000000000000..205c684fa995 --- /dev/null +++ b/tests/generic/744.out @@ -0,0 +1,2 @@ +QA output created by 744 +Silence is golden -- 2.43.0