Re: [PATCH] fstests: add fsstress + compaction test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 17, 2024 at 05:13:56PM -0700, Luis Chamberlain wrote:
> Running compaction while we run fsstress can crash older kernels as per
> korg#218227 [0], the fix for that [0] has been posted [1] but that patch
> is not yet on v6.9-rc4 and the patch requires changes for v6.9.

It doesn't require changes, it just has prerequisites:

https://lore.kernel.org/all/ZgHhcojXc9QjynUI@xxxxxxxxxxxxxxxxxxxx/

> Today I find that v6.9-rc4 is also hitting an unrecoverable hung task
> between compaction and fsstress while running generic/476 on the
> following kdevops test sections [2]:
> 
>   * xfs_nocrc
>   * xfs_nocrc_2k
>   * xfs_nocrc_4k
> 
> Analyzing the trace I see the guest uses loopback block devices for the
> fstests TEST_DEV, the loopback file uses sparsefiles on a btrfs
> partition. The contention based on traces [3] [4] seems to be that we
> have somehow have fsstress + compaction race on folio_wait_bit_common().

What do you mean by "race"?  Here's what I see:

Apr 16 23:06:11 base-xfs-nocrc-2k kernel: INFO: task kcompactd0:72 blocked for more than 120 seconds.
Apr 16 23:06:11 base-xfs-nocrc-2k kernel:       Not tainted 6.9.0-rc4 #4
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: task:kcompactd0      state:D stack:0     pid:72    tgid:72    ppid:2      flags:0x00004000
Apr 16 23:06:11 base-xfs-nocrc-2k kernel: Call Trace:
Apr 16 23:06:11 base-xfs-nocrc-2k kernel:  <TASK>
Apr 16 23:06:11 base-xfs-nocrc-2k kernel:  __schedule+0x3d9/0xaf0
Apr 16 23:06:11 base-xfs-nocrc-2k kernel:  schedule+0x26/0xf0
Apr 16 23:06:11 base-xfs-nocrc-2k kernel:  io_schedule+0x42/0x70
Apr 16 23:06:11 base-xfs-nocrc-2k kernel:  folio_wait_bit_common+0x123/0x370
Apr 16 23:06:11 base-xfs-nocrc-2k kernel:  ? __pfx_wake_page_function+0x10/0x10
Apr 16 23:06:11 base-xfs-nocrc-2k kernel:  migrate_pages_batch+0x69a/0xd70

But you didn't run the backtrace through scripts/decode_stacktrace.sh
so I can't figure out what we're waiting on.

> We have this happening:
> 
>   a) kthread compaction --> migrate_pages_batch()
>                 --> folio_wait_bit_common()
>   b) workqueue on btrfs writeback wb_workfn  --> extent_write_cache_pages()
>                 --> folio_wait_bit_common()
>   c) workqueue on loopback loop_rootcg_workfn() --> filemap_fdatawrite_wbc()
>                 --> folio_wait_bit_common()
>   d) kthread xfsaild --> blk_mq_submit_bio() --> wbt_wait()
> 
> I tried to reproduce but couldn't easily do so, so I wrote this test
> to help, and with this I have 100% failure rate so far out of 2 runs.
> 
> Given we also have korg#218227 and that patch likely needing
> backporting, folks will want a reproducer for this issue. This should
> hopefully help with that case and this new separate issue.
> 
> To reproduce with kdevops just:
> 
> make defconfig-xfs_nocrc_2k  -j $(nproc)
> make -j $(nproc)
> make fstests
> make linux
> make fstests-baseline TESTS=generic/733
> tail -f guestfs/*-xfs-nocrc-2k/console.log
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=218227
> [1] https://lore.kernel.org/all/7ee2bb8c-441a-418b-ba3a-d305f69d31c8@xxxxxxx/T/#u
> [2] https://github.com/linux-kdevops/kdevops/blob/main/playbooks/roles/fstests/templates/xfs/xfs.config
> [3] https://gist.github.com/mcgrof/4dfa3264f513ce6ca398414326cfab84
> [4] https://gist.github.com/mcgrof/f40a9f31a43793dac928ce287cfacfeb
> 
> Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> ---
> 
> Note: kdevops uses its own fork of fstests which has this merged
> already, so the above should just work. If it's your first time using
> kdevops be sure to just read the README for the first time users:
> 
> https://github.com/linux-kdevops/kdevops/blob/main/docs/kdevops-first-run.md
> 
>  common/rc             |  7 ++++++
>  tests/generic/744     | 56 +++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/744.out |  2 ++
>  3 files changed, 65 insertions(+)
>  create mode 100755 tests/generic/744
>  create mode 100644 tests/generic/744.out
> 
> diff --git a/common/rc b/common/rc
> index b7b77ac1b46d..d4432f5ce259 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -120,6 +120,13 @@ _require_hugepages()
>  		_notrun "Kernel does not report huge page size"
>  }
>  
> +# Requires CONFIG_COMPACTION
> +_require_compaction()
> +{
> +	if [ ! -f /proc/sys/vm/compact_memory ]; then
> +	    _notrun "Need compaction enabled CONFIG_COMPACTION=y"
> +	fi
> +}
>  # Get hugepagesize in bytes
>  _get_hugepagesize()
>  {
> diff --git a/tests/generic/744 b/tests/generic/744
> new file mode 100755
> index 000000000000..2b3c0c7e92fb
> --- /dev/null
> +++ b/tests/generic/744
> @@ -0,0 +1,56 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Luis Chamberlain.  All Rights Reserved.
> +#
> +# FS QA Test 744
> +#
> +# fsstress + compaction test
> +#
> +. ./common/preamble
> +_begin_fstest auto rw long_rw stress soak smoketest
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +}
> +
> +# Import common functions.
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs generic
> +
> +_require_scratch
> +_require_compaction
> +_require_command "$KILLALL_PROG" "killall"
> +
> +echo "Silence is golden."
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount >> $seqres.full 2>&1
> +
> +nr_cpus=$((LOAD_FACTOR * 4))
> +nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
> +fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)
> +
> +# start a background getxattr loop for the existing xattr
> +runfile="$tmp.getfattr"
> +touch $runfile
> +while [ -e $runfile ]; do
> +	echo 1 > /proc/sys/vm/compact_memory
> +	sleep 15
> +done &
> +getfattr_pid=$!
> +
> +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
> +
> +$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full
> +
> +rm -f $runfile
> +wait > /dev/null 2>&1
> +
> +status=0
> +exit
> diff --git a/tests/generic/744.out b/tests/generic/744.out
> new file mode 100644
> index 000000000000..205c684fa995
> --- /dev/null
> +++ b/tests/generic/744.out
> @@ -0,0 +1,2 @@
> +QA output created by 744
> +Silence is golden
> -- 
> 2.43.0
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux