Re: [PATCH] fstests: add fsstress + compaction test

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Wed, Apr 17, 2024 at 05:13:56PM -0700, Luis Chamberlain wrote:
> Running compaction while we run fsstress can crash older kernels as per
> korg#218227 [0], the fix for that [0] has been posted [1] but that patch
> is not yet on v6.9-rc4 and the patch requires changes for v6.9.
> 
> Today I find that v6.9-rc4 is also hitting an unrecoverable hung task
> between compaction and fsstress while running generic/476 on the
> following kdevops test sections [2]:
> 
>   * xfs_nocrc
>   * xfs_nocrc_2k
>   * xfs_nocrc_4k
> 
> Analyzing the trace I see the guest uses loopback block devices for the
> fstests TEST_DEV, the loopback file uses sparsefiles on a btrfs
> partition. The contention based on traces [3] [4] seems to be that we
> have somehow have fsstress + compaction race on folio_wait_bit_common().
> 
> We have this happening:
> 
>   a) kthread compaction --> migrate_pages_batch()
>                 --> folio_wait_bit_common()
>   b) workqueue on btrfs writeback wb_workfn  --> extent_write_cache_pages()
>                 --> folio_wait_bit_common()
>   c) workqueue on loopback loop_rootcg_workfn() --> filemap_fdatawrite_wbc()
>                 --> folio_wait_bit_common()
>   d) kthread xfsaild --> blk_mq_submit_bio() --> wbt_wait()
> 
> I tried to reproduce but couldn't easily do so, so I wrote this test
> to help, and with this I have 100% failure rate so far out of 2 runs.
> 
> Given we also have korg#218227 and that patch likely needing
> backporting, folks will want a reproducer for this issue. This should
> hopefully help with that case and this new separate issue.
> 
> To reproduce with kdevops just:
> 
> make defconfig-xfs_nocrc_2k  -j $(nproc)
> make -j $(nproc)
> make fstests
> make linux
> make fstests-baseline TESTS=generic/733
> tail -f guestfs/*-xfs-nocrc-2k/console.log
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=218227
> [1] https://lore.kernel.org/all/7ee2bb8c-441a-418b-ba3a-d305f69d31c8@xxxxxxx/T/#u
> [2] https://github.com/linux-kdevops/kdevops/blob/main/playbooks/roles/fstests/templates/xfs/xfs.config
> [3] https://gist.github.com/mcgrof/4dfa3264f513ce6ca398414326cfab84
> [4] https://gist.github.com/mcgrof/f40a9f31a43793dac928ce287cfacfeb
> 
> Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> ---
> 
> Note: kdevops uses its own fork of fstests which has this merged
> already, so the above should just work. If it's your first time using
> kdevops be sure to just read the README for the first time users:
> 
> https://github.com/linux-kdevops/kdevops/blob/main/docs/kdevops-first-run.md
> 
>  common/rc             |  7 ++++++
>  tests/generic/744     | 56 +++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/744.out |  2 ++
>  3 files changed, 65 insertions(+)
>  create mode 100755 tests/generic/744
>  create mode 100644 tests/generic/744.out
> 
> diff --git a/common/rc b/common/rc
> index b7b77ac1b46d..d4432f5ce259 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -120,6 +120,13 @@ _require_hugepages()
>  		_notrun "Kernel does not report huge page size"
>  }
>  
> +# Requires CONFIG_COMPACTION
> +_require_compaction()

I'm not sure if we should name it as "_require_vm_compaction", does linux
have other "compaction" or only memory compaction?

> +{
> +	if [ ! -f /proc/sys/vm/compact_memory ]; then
> +	    _notrun "Need compaction enabled CONFIG_COMPACTION=y"
> +	fi
> +}
>  # Get hugepagesize in bytes
>  _get_hugepagesize()
>  {
> diff --git a/tests/generic/744 b/tests/generic/744
> new file mode 100755
> index 000000000000..2b3c0c7e92fb
> --- /dev/null
> +++ b/tests/generic/744
> @@ -0,0 +1,56 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Luis Chamberlain.  All Rights Reserved.
> +#
> +# FS QA Test 744
> +#
> +# fsstress + compaction test

fsstress + memory compaction ?

Looks like this case is copied from g/476, just add memory_compaction
test. That makes sense to me from the test side.

I'm a bit confused on your discussion about an old bug and a new bug(?)
you just found. Looks like you're reporting a bug, and provide a test
case to fstests@ by the way. Anyway, I think there's not objection on
this test itself, right? And is this test for someone known bug or not?

> +#
> +. ./common/preamble
> +_begin_fstest auto rw long_rw stress soak smoketest
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +}
> +
> +# Import common functions.
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.

Useless comment~

> +_supported_fs generic
> +
> +_require_scratch
> +_require_compaction
> +_require_command "$KILLALL_PROG" "killall"
> +
> +echo "Silence is golden."
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount >> $seqres.full 2>&1
> +
> +nr_cpus=$((LOAD_FACTOR * 4))
> +nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
> +fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)
> +
> +# start a background getxattr loop for the existing xattr
> +runfile="$tmp.getfattr"
> +touch $runfile
> +while [ -e $runfile ]; do
> +	echo 1 > /proc/sys/vm/compact_memory
> +	sleep 15
> +done &
> +getfattr_pid=$!

I didn't see any other place use this "getfattr_pid". Better to deal with
it in _cleanup().

> +
> +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
> +
> +$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full
> +
> +rm -f $runfile
> +wait > /dev/null 2>&1

Better to do these things in _cleanup() function, make sure all background
processes can be done in _cleanup.

> +
> +status=0
> +exit
> diff --git a/tests/generic/744.out b/tests/generic/744.out
> new file mode 100644
> index 000000000000..205c684fa995
> --- /dev/null
> +++ b/tests/generic/744.out
> @@ -0,0 +1,2 @@
> +QA output created by 744
> +Silence is golden
> -- 
> 2.43.0
> 
> 





[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux