Re: [PATCH] generic: add gc stress test

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Wed, May 08, 2024 at 07:08:01AM +0000, Hans Holmberg wrote:
> On 2024-04-17 16:50, Hans Holmberg wrote:
> > On 2024-04-17 16:07, Zorro Lang wrote:
> >> On Wed, Apr 17, 2024 at 01:21:39PM +0000, Hans Holmberg wrote:
> >>> On 2024-04-17 14:43, Zorro Lang wrote:
> >>>> On Tue, Apr 16, 2024 at 11:54:37AM -0700, Darrick J. Wong wrote:
> >>>>> On Tue, Apr 16, 2024 at 09:07:43AM +0000, Hans Holmberg wrote:
> >>>>>> +Zorro (doh!)
> >>>>>>
> >>>>>> On 2024-04-15 13:23, Hans Holmberg wrote:
> >>>>>>> This test stresses garbage collection for file systems by first filling
> >>>>>>> up a scratch mount to a specific usage point with files of random size,
> >>>>>>> then doing overwrites in parallel with deletes to fragment the backing
> >>>>>>> storage, forcing reclaim.
> >>>>>>>
> >>>>>>> Signed-off-by: Hans Holmberg <hans.holmberg@xxxxxxx>
> >>>>>>> ---
> >>>>>>>
> >>>>>>> Test results in my setup (kernel 6.8.0-rc4+)
> >>>>>>> 	f2fs on zoned nullblk: pass (77s)
> >>>>>>> 	f2fs on conventional nvme ssd: pass (13s)
> >>>>>>> 	btrfs on zoned nublk: fails (-ENOSPC)
> >>>>>>> 	btrfs on conventional nvme ssd: fails (-ENOSPC)
> >>>>>>> 	xfs on conventional nvme ssd: pass (8s)
> >>>>>>>
> >>>>>>> Johannes(cc) is working on the btrfs ENOSPC issue.
> >>>>>>> 	
> >>>>>>>      tests/generic/744     | 124 ++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>      tests/generic/744.out |   6 ++
> >>>>>>>      2 files changed, 130 insertions(+)
> >>>>>>>      create mode 100755 tests/generic/744
> >>>>>>>      create mode 100644 tests/generic/744.out
> >>>>>>>
> >>>>>>> diff --git a/tests/generic/744 b/tests/generic/744
> >>>>>>> new file mode 100755
> >>>>>>> index 000000000000..2c7ab76bf8b1
> >>>>>>> --- /dev/null
> >>>>>>> +++ b/tests/generic/744
> >>>>>>> @@ -0,0 +1,124 @@
> >>>>>>> +#! /bin/bash
> >>>>>>> +# SPDX-License-Identifier: GPL-2.0
> >>>>>>> +# Copyright (c) 2024 Western Digital Corporation.  All Rights Reserved.
> >>>>>>> +#
> >>>>>>> +# FS QA Test No. 744
> >>>>>>> +#
> >>>>>>> +# Inspired by btrfs/273 and generic/015
> >>>>>>> +#
> >>>>>>> +# This test stresses garbage collection in file systems
> >>>>>>> +# by first filling up a scratch mount to a specific usage point with
> >>>>>>> +# files of random size, then doing overwrites in parallel with
> >>>>>>> +# deletes to fragment the backing zones, forcing reclaim.
> >>>>>>> +
> >>>>>>> +. ./common/preamble
> >>>>>>> +_begin_fstest auto
> >>>>>>> +
> >>>>>>> +# real QA test starts here
> >>>>>>> +
> >>>>>>> +_require_scratch
> >>>>>>> +
> >>>>>>> +# This test requires specific data space usage, skip if we have compression
> >>>>>>> +# enabled.
> >>>>>>> +_require_no_compress
> >>>>>>> +
> >>>>>>> +M=$((1024 * 1024))
> >>>>>>> +min_fsz=$((1 * ${M}))
> >>>>>>> +max_fsz=$((256 * ${M}))
> >>>>>>> +bs=${M}
> >>>>>>> +fill_percent=95
> >>>>>>> +overwrite_percentage=20
> >>>>>>> +seq=0
> >>>>>>> +
> >>>>>>> +_create_file() {
> >>>>>>> +	local file_name=${SCRATCH_MNT}/data_$1
> >>>>>>> +	local file_sz=$2
> >>>>>>> +	local dd_extra=$3
> >>>>>>> +
> >>>>>>> +	POSIXLY_CORRECT=yes dd if=/dev/zero of=${file_name} \
> >>>>>>> +		bs=${bs} count=$(( $file_sz / ${bs} )) \
> >>>>>>> +		status=none $dd_extra  2>&1
> >>>>>>> +
> >>>>>>> +	status=$?
> >>>>>>> +	if [ $status -ne 0 ]; then
> >>>>>>> +		echo "Failed writing $file_name" >>$seqres.full
> >>>>>>> +		exit
> >>>>>>> +	fi
> >>>>>>> +}
> >>>>>
> >>>>> I wonder, is there a particular reason for doing all these file
> >>>>> operations with shell code instead of using fsstress to create and
> >>>>> delete files to fill the fs and stress all the zone-gc code?  This test
> >>>>> reminds me a lot of generic/476 but with more fork()ing.
> >>>>
> >>>> /me has the same confusion. Can this test cover more things than using
> >>>> fsstress (to do reclaim test) ? Or does it uncover some known bugs which
> >>>> other cases can't?
> >>>
> >>> ah, adding some more background is probably useful:
> >>>
> >>> I've been using this test to stress the crap out the zoned xfs garbage
> >>> collection / write throttling implementation for zoned rt subvolumes
> >>> support in xfs and it has found a number of issues during implementation
> >>> that i did not reproduce by other means.
> >>>
> >>> I think it also has wider applicability as it triggers bugs in btrfs.
> >>> f2fs passes without issues, but probably benefits from a quick smoke gc
> >>> test as well. Discussed this with Bart and Daeho (now in cc) before
> >>> submitting.
> >>>
> >>> Using fsstress would be cool, but as far as I can tell it cannot
> >>> be told to operate at a specific file system usage point, which
> >>> is a key thing for this test.
> >>
> >> As a random test case, if this case can be transformed to use fsstress to cover
> >> same issues, that would be nice.
> >>
> >> But if as a regression test case, it has its particular test coverage, and the
> >> issue it covered can't be reproduced by fsstress way, then let's work on this
> >> bash script one.
> >>
> >> Any thoughts?
> > 
> > Yeah, I think bash is preferable for this particular test case.
> > Bash also makes it easy to hack for people's private uses.
> > 
> > I use longer versions of this test (increasing overwrite_percentage)
> > for weekly testing.
> > 
> > If we need fsstress for reproducing any future gc bug we can add
> > whats missing to it then.
> > 
> > Does that make sense?
> > 
> 
> Hey Zorro,
> 
> Any remaining concerns for adding this test? I could run it across
> more file systems(bcachefs could be interesting) and share the results 
> if needed be.

Hi,

I remembered you metioned btrfs fails on this test, and I can reproduce it
on btrfs [1] with general disk. Have you figured out the reason? I don't
want to give btrfs a test failure suddently without a proper explanation :)
If it's a case issue, better to fix it for btrfs.

Thanks,
Zorro

# ./check generic/744
FSTYP         -- btrfs
PLATFORM      -- Linux/x86_64 hp-dl380pg8-01 6.9.0-0.rc5.20240425gite88c4cfcb7b8.47.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 25 14:21:52 UTC 2024
MKFS_OPTIONS  -- /dev/sda4
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda4 /mnt/scratch

generic/744 115s ... [failed, exit status 1]- output mismatch (see /root/git/xfstests/results//generic/744.out.bad)
    --- tests/generic/744.out   2024-05-08 16:11:14.476635417 +0800
    +++ /root/git/xfstests/results//generic/744.out.bad 2024-05-08 16:46:03.617194377 +0800
    @@ -2,5 +2,4 @@
     Starting fillup using direct IO
     Starting mixed write/delete test using direct IO
     Starting mixed write/delete test using buffered IO
    -Syncing
    -Done, all good
    +dd: error writing '/mnt/scratch/data_82': No space left on device
    ...
    (Run 'diff -u /root/git/xfstests/tests/generic/744.out /root/git/xfstests/results//generic/744.out.bad'  to see the entire diff)
Ran: generic/744
Failures: generic/744
Failed 1 of 1 tests

> 
> Thanks,
> Hans





[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux