Re: [PATCH] generic: add gc stress test

Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx> · Sun, 12 May 2024 16:54:30 +0000

[ +CC Boris ]
On 11.05.24 07:08, Hans Holmberg wrote:
> On 2024-05-08 10:51, Zorro Lang wrote:
>> On Wed, May 08, 2024 at 07:08:01AM +0000, Hans Holmberg wrote:
>>> On 2024-04-17 16:50, Hans Holmberg wrote:
>>>> On 2024-04-17 16:07, Zorro Lang wrote:
>>>>> On Wed, Apr 17, 2024 at 01:21:39PM +0000, Hans Holmberg wrote:
>>>>>> On 2024-04-17 14:43, Zorro Lang wrote:
>>>>>>> On Tue, Apr 16, 2024 at 11:54:37AM -0700, Darrick J. Wong wrote:
>>>>>>>> On Tue, Apr 16, 2024 at 09:07:43AM +0000, Hans Holmberg wrote:
>>>>>>>>> +Zorro (doh!)
>>>>>>>>>
>>>>>>>>> On 2024-04-15 13:23, Hans Holmberg wrote:
>>>>>>>>>> This test stresses garbage collection for file systems by first filling
>>>>>>>>>> up a scratch mount to a specific usage point with files of random size,
>>>>>>>>>> then doing overwrites in parallel with deletes to fragment the backing
>>>>>>>>>> storage, forcing reclaim.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Hans Holmberg <hans.holmberg@xxxxxxx>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> Test results in my setup (kernel 6.8.0-rc4+)
>>>>>>>>>> 	f2fs on zoned nullblk: pass (77s)
>>>>>>>>>> 	f2fs on conventional nvme ssd: pass (13s)
>>>>>>>>>> 	btrfs on zoned nublk: fails (-ENOSPC)
>>>>>>>>>> 	btrfs on conventional nvme ssd: fails (-ENOSPC)
>>>>>>>>>> 	xfs on conventional nvme ssd: pass (8s)
>>>>>>>>>>
>>>>>>>>>> Johannes(cc) is working on the btrfs ENOSPC issue.
>>>>>>>>>> 	
>>>>>>>>>>        tests/generic/744     | 124 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>        tests/generic/744.out |   6 ++
>>>>>>>>>>        2 files changed, 130 insertions(+)
>>>>>>>>>>        create mode 100755 tests/generic/744
>>>>>>>>>>        create mode 100644 tests/generic/744.out
>>>>>>>>>>
>>>>>>>>>> diff --git a/tests/generic/744 b/tests/generic/744
>>>>>>>>>> new file mode 100755
>>>>>>>>>> index 000000000000..2c7ab76bf8b1
>>>>>>>>>> --- /dev/null
>>>>>>>>>> +++ b/tests/generic/744
>>>>>>>>>> @@ -0,0 +1,124 @@
>>>>>>>>>> +#! /bin/bash
>>>>>>>>>> +# SPDX-License-Identifier: GPL-2.0
>>>>>>>>>> +# Copyright (c) 2024 Western Digital Corporation.  All Rights Reserved.
>>>>>>>>>> +#
>>>>>>>>>> +# FS QA Test No. 744
>>>>>>>>>> +#
>>>>>>>>>> +# Inspired by btrfs/273 and generic/015
>>>>>>>>>> +#
>>>>>>>>>> +# This test stresses garbage collection in file systems
>>>>>>>>>> +# by first filling up a scratch mount to a specific usage point with
>>>>>>>>>> +# files of random size, then doing overwrites in parallel with
>>>>>>>>>> +# deletes to fragment the backing zones, forcing reclaim.
>>>>>>>>>> +
>>>>>>>>>> +. ./common/preamble
>>>>>>>>>> +_begin_fstest auto
>>>>>>>>>> +
>>>>>>>>>> +# real QA test starts here
>>>>>>>>>> +
>>>>>>>>>> +_require_scratch
>>>>>>>>>> +
>>>>>>>>>> +# This test requires specific data space usage, skip if we have compression
>>>>>>>>>> +# enabled.
>>>>>>>>>> +_require_no_compress
>>>>>>>>>> +
>>>>>>>>>> +M=$((1024 * 1024))
>>>>>>>>>> +min_fsz=$((1 * ${M}))
>>>>>>>>>> +max_fsz=$((256 * ${M}))
>>>>>>>>>> +bs=${M}
>>>>>>>>>> +fill_percent=95
>>>>>>>>>> +overwrite_percentage=20
>>>>>>>>>> +seq=0
>>>>>>>>>> +
>>>>>>>>>> +_create_file() {
>>>>>>>>>> +	local file_name=${SCRATCH_MNT}/data_$1
>>>>>>>>>> +	local file_sz=$2
>>>>>>>>>> +	local dd_extra=$3
>>>>>>>>>> +
>>>>>>>>>> +	POSIXLY_CORRECT=yes dd if=/dev/zero of=${file_name} \
>>>>>>>>>> +		bs=${bs} count=$(( $file_sz / ${bs} )) \
>>>>>>>>>> +		status=none $dd_extra  2>&1
>>>>>>>>>> +
>>>>>>>>>> +	status=$?
>>>>>>>>>> +	if [ $status -ne 0 ]; then
>>>>>>>>>> +		echo "Failed writing $file_name" >>$seqres.full
>>>>>>>>>> +		exit
>>>>>>>>>> +	fi
>>>>>>>>>> +}
>>>>>>>>
>>>>>>>> I wonder, is there a particular reason for doing all these file
>>>>>>>> operations with shell code instead of using fsstress to create and
>>>>>>>> delete files to fill the fs and stress all the zone-gc code?  This test
>>>>>>>> reminds me a lot of generic/476 but with more fork()ing.
>>>>>>>
>>>>>>> /me has the same confusion. Can this test cover more things than using
>>>>>>> fsstress (to do reclaim test) ? Or does it uncover some known bugs which
>>>>>>> other cases can't?
>>>>>>
>>>>>> ah, adding some more background is probably useful:
>>>>>>
>>>>>> I've been using this test to stress the crap out the zoned xfs garbage
>>>>>> collection / write throttling implementation for zoned rt subvolumes
>>>>>> support in xfs and it has found a number of issues during implementation
>>>>>> that i did not reproduce by other means.
>>>>>>
>>>>>> I think it also has wider applicability as it triggers bugs in btrfs.
>>>>>> f2fs passes without issues, but probably benefits from a quick smoke gc
>>>>>> test as well. Discussed this with Bart and Daeho (now in cc) before
>>>>>> submitting.
>>>>>>
>>>>>> Using fsstress would be cool, but as far as I can tell it cannot
>>>>>> be told to operate at a specific file system usage point, which
>>>>>> is a key thing for this test.
>>>>>
>>>>> As a random test case, if this case can be transformed to use fsstress to cover
>>>>> same issues, that would be nice.
>>>>>
>>>>> But if as a regression test case, it has its particular test coverage, and the
>>>>> issue it covered can't be reproduced by fsstress way, then let's work on this
>>>>> bash script one.
>>>>>
>>>>> Any thoughts?
>>>>
>>>> Yeah, I think bash is preferable for this particular test case.
>>>> Bash also makes it easy to hack for people's private uses.
>>>>
>>>> I use longer versions of this test (increasing overwrite_percentage)
>>>> for weekly testing.
>>>>
>>>> If we need fsstress for reproducing any future gc bug we can add
>>>> whats missing to it then.
>>>>
>>>> Does that make sense?
>>>>
>>>
>>> Hey Zorro,
>>>
>>> Any remaining concerns for adding this test? I could run it across
>>> more file systems(bcachefs could be interesting) and share the results
>>> if needed be.
>>
>> Hi,
>>
>> I remembered you metioned btrfs fails on this test, and I can reproduce it
>> on btrfs [1] with general disk. Have you figured out the reason? I don't
>> want to give btrfs a test failure suddently without a proper explanation :)
>> If it's a case issue, better to fix it for btrfs.
> 
> 
> I was surprised to see the failure for brtrfs on a conventional block
> device, but have not dug into it. I suspect/assume it's the same root
> cause as the issue Johannes is looking into when using a zoned block
> device as backing storage.
> 
> I debugged that a bit with Johannes, and noticed that if I manually
> kick btrfs rebalancing after each write via sysfs, the test progresses
> further (but super slow).
> 
> So *I think* that btrfs needs to:
> 
> * tune the triggering of gc to kick in way before available free space
>     runs out
> * start slowing down / blocking writes when reclaim pressure is high to
>     avoid premature -ENOSPC:es.

Yes both Boris and I are working on different solutions to the GC 
problem. But apart from that, I have the feeling that using stat to 
check on the available space is not the best idea, at least for btrfs.

> It's a pretty nasty problem, as potentially any write could -ENOSPC
> long before the reported available space runs out when a workload
> ends up fragmenting the disk and write pressure is high..