On Mon, Jun 10, 2024 at 08:02:00PM -0700, Luis Chamberlain wrote: > Running compaction while we run fsstress can crash older kernels as per > korg#218227 [0], the fix for that [0] has been posted [1] that patch > was merged on v6.9-rc6 fixed by commit d99e3140a4d3 ("mm: turn > folio_test_hugetlb into a PageType"). However even on v6.10-rc2 where > this kernel commit is already merged we can still deadlock when running > fsstress and at the same time triggering compaction, this is a new > issue being reported now this through patch, but this patch also > serves as a reproducer with a high confidence. It always deadlocks. > If you enable CONFIG_PROVE_LOCKING with the defaults you will end up > with a complaint about increasing MAX_LOCKDEP_CHAIN_HLOCKS [1], if > you adjust that you then end up with a few soft lockup complaints and > some possible deadlock candidates to evaluate [2]. > > Provide a simple reproducer and pave the way so we keep on testing this. > > Without lockdep enabled we silently deadlock on the first run of the > test without the fix applied. With lockdep enabled you get a splat about > the possible deadlock on the first run of the test. > > [0] https://bugzilla.kernel.org/show_bug.cgi?id=218227 > [1] https://gist.github.com/mcgrof/824913b645892214effeb1631df75072 > [2] https://gist.github.com/mcgrof/926e183d21c5c4c55d74ec90197bd77a > > Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> > --- > common/rc | 7 +++++ > tests/generic/750 | 62 +++++++++++++++++++++++++++++++++++++++++++ > tests/generic/750.out | 2 ++ > 3 files changed, 71 insertions(+) > create mode 100755 tests/generic/750 > create mode 100644 tests/generic/750.out > > diff --git a/common/rc b/common/rc > index e812a2f7cc67..18ad25662d5c 100644 > --- a/common/rc > +++ b/common/rc > @@ -151,6 +151,13 @@ _require_hugepages() > _notrun "Kernel does not report huge page size" > } > > +# Requires CONFIG_COMPACTION > +_require_vm_compaction() > +{ > + if [ ! -f /proc/sys/vm/compact_memory ]; then > + _notrun "Need compaction enabled CONFIG_COMPACTION=y" > + fi > +} > # Get hugepagesize in bytes > _get_hugepagesize() > { > diff --git a/tests/generic/750 b/tests/generic/750 > new file mode 100755 > index 000000000000..334ab011dfa0 > --- /dev/null > +++ b/tests/generic/750 > @@ -0,0 +1,62 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2024 Luis Chamberlain. All Rights Reserved. > +# > +# FS QA Test 750 > +# > +# fsstress + memory compaction test > +# > +. ./common/preamble > +_begin_fstest auto rw long_rw stress soak smoketest > + > +_cleanup() > +{ > + cd / > + rm -f $runfile > + rm -f $tmp.* > + kill -9 $trigger_compaction_pid > /dev/null 2>&1 > + $KILLALL_PROG -9 fsstress > /dev/null 2>&1 > + > + wait > /dev/null 2>&1 > +} > + > +# Import common functions. > + > +# real QA test starts here > + > +_supported_fs generic > + > +_require_scratch > +_require_vm_compaction > +_require_command "$KILLALL_PROG" "killall" > + > +# We still deadlock with this test on v6.10-rc2, we need more work. > +# but the below makes things better. > +_fixed_by_git_commit kernel d99e3140a4d3 \ > + "mm: turn folio_test_hugetlb into a PageType" > + > +echo "Silence is golden" > + > +_scratch_mkfs > $seqres.full 2>&1 > +_scratch_mount >> $seqres.full 2>&1 > + > +nr_cpus=$((LOAD_FACTOR * 4)) > +nr_ops=$((25000 * nr_cpus * TIME_FACTOR)) > +fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus) > + > +# start a background trigger for memory compaction > +runfile="$tmp.compaction" > +touch $runfile > +while [ -e $runfile ]; do > + echo 1 > /proc/sys/vm/compact_memory > + sleep 5 > +done & > +trigger_compaction_pid=$! > + > +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION") > + > +$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full > +wait > /dev/null 2>&1 Won't this "wait" wait forever (except a ctrl+C), due to no one removes the $runfile? Thanks, Zorro > + > +status=0 > +exit > diff --git a/tests/generic/750.out b/tests/generic/750.out > new file mode 100644 > index 000000000000..bd79507b632e > --- /dev/null > +++ b/tests/generic/750.out > @@ -0,0 +1,2 @@ > +QA output created by 750 > +Silence is golden > -- > 2.43.0 > >