[ +CC Boris ] On 11.05.24 07:08, Hans Holmberg wrote: > On 2024-05-08 10:51, Zorro Lang wrote: >> On Wed, May 08, 2024 at 07:08:01AM +0000, Hans Holmberg wrote: >>> On 2024-04-17 16:50, Hans Holmberg wrote: >>>> On 2024-04-17 16:07, Zorro Lang wrote: >>>>> On Wed, Apr 17, 2024 at 01:21:39PM +0000, Hans Holmberg wrote: >>>>>> On 2024-04-17 14:43, Zorro Lang wrote: >>>>>>> On Tue, Apr 16, 2024 at 11:54:37AM -0700, Darrick J. Wong wrote: >>>>>>>> On Tue, Apr 16, 2024 at 09:07:43AM +0000, Hans Holmberg wrote: >>>>>>>>> +Zorro (doh!) >>>>>>>>> >>>>>>>>> On 2024-04-15 13:23, Hans Holmberg wrote: >>>>>>>>>> This test stresses garbage collection for file systems by first filling >>>>>>>>>> up a scratch mount to a specific usage point with files of random size, >>>>>>>>>> then doing overwrites in parallel with deletes to fragment the backing >>>>>>>>>> storage, forcing reclaim. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Hans Holmberg <hans.holmberg@xxxxxxx> >>>>>>>>>> --- >>>>>>>>>> >>>>>>>>>> Test results in my setup (kernel 6.8.0-rc4+) >>>>>>>>>> f2fs on zoned nullblk: pass (77s) >>>>>>>>>> f2fs on conventional nvme ssd: pass (13s) >>>>>>>>>> btrfs on zoned nublk: fails (-ENOSPC) >>>>>>>>>> btrfs on conventional nvme ssd: fails (-ENOSPC) >>>>>>>>>> xfs on conventional nvme ssd: pass (8s) >>>>>>>>>> >>>>>>>>>> Johannes(cc) is working on the btrfs ENOSPC issue. >>>>>>>>>> >>>>>>>>>> tests/generic/744 | 124 ++++++++++++++++++++++++++++++++++++++++++ >>>>>>>>>> tests/generic/744.out | 6 ++ >>>>>>>>>> 2 files changed, 130 insertions(+) >>>>>>>>>> create mode 100755 tests/generic/744 >>>>>>>>>> create mode 100644 tests/generic/744.out >>>>>>>>>> >>>>>>>>>> diff --git a/tests/generic/744 b/tests/generic/744 >>>>>>>>>> new file mode 100755 >>>>>>>>>> index 000000000000..2c7ab76bf8b1 >>>>>>>>>> --- /dev/null >>>>>>>>>> +++ b/tests/generic/744 >>>>>>>>>> @@ -0,0 +1,124 @@ >>>>>>>>>> +#! /bin/bash >>>>>>>>>> +# SPDX-License-Identifier: GPL-2.0 >>>>>>>>>> +# Copyright (c) 2024 Western Digital Corporation. All Rights Reserved. >>>>>>>>>> +# >>>>>>>>>> +# FS QA Test No. 744 >>>>>>>>>> +# >>>>>>>>>> +# Inspired by btrfs/273 and generic/015 >>>>>>>>>> +# >>>>>>>>>> +# This test stresses garbage collection in file systems >>>>>>>>>> +# by first filling up a scratch mount to a specific usage point with >>>>>>>>>> +# files of random size, then doing overwrites in parallel with >>>>>>>>>> +# deletes to fragment the backing zones, forcing reclaim. >>>>>>>>>> + >>>>>>>>>> +. ./common/preamble >>>>>>>>>> +_begin_fstest auto >>>>>>>>>> + >>>>>>>>>> +# real QA test starts here >>>>>>>>>> + >>>>>>>>>> +_require_scratch >>>>>>>>>> + >>>>>>>>>> +# This test requires specific data space usage, skip if we have compression >>>>>>>>>> +# enabled. >>>>>>>>>> +_require_no_compress >>>>>>>>>> + >>>>>>>>>> +M=$((1024 * 1024)) >>>>>>>>>> +min_fsz=$((1 * ${M})) >>>>>>>>>> +max_fsz=$((256 * ${M})) >>>>>>>>>> +bs=${M} >>>>>>>>>> +fill_percent=95 >>>>>>>>>> +overwrite_percentage=20 >>>>>>>>>> +seq=0 >>>>>>>>>> + >>>>>>>>>> +_create_file() { >>>>>>>>>> + local file_name=${SCRATCH_MNT}/data_$1 >>>>>>>>>> + local file_sz=$2 >>>>>>>>>> + local dd_extra=$3 >>>>>>>>>> + >>>>>>>>>> + POSIXLY_CORRECT=yes dd if=/dev/zero of=${file_name} \ >>>>>>>>>> + bs=${bs} count=$(( $file_sz / ${bs} )) \ >>>>>>>>>> + status=none $dd_extra 2>&1 >>>>>>>>>> + >>>>>>>>>> + status=$? >>>>>>>>>> + if [ $status -ne 0 ]; then >>>>>>>>>> + echo "Failed writing $file_name" >>$seqres.full >>>>>>>>>> + exit >>>>>>>>>> + fi >>>>>>>>>> +} >>>>>>>> >>>>>>>> I wonder, is there a particular reason for doing all these file >>>>>>>> operations with shell code instead of using fsstress to create and >>>>>>>> delete files to fill the fs and stress all the zone-gc code? This test >>>>>>>> reminds me a lot of generic/476 but with more fork()ing. >>>>>>> >>>>>>> /me has the same confusion. Can this test cover more things than using >>>>>>> fsstress (to do reclaim test) ? Or does it uncover some known bugs which >>>>>>> other cases can't? >>>>>> >>>>>> ah, adding some more background is probably useful: >>>>>> >>>>>> I've been using this test to stress the crap out the zoned xfs garbage >>>>>> collection / write throttling implementation for zoned rt subvolumes >>>>>> support in xfs and it has found a number of issues during implementation >>>>>> that i did not reproduce by other means. >>>>>> >>>>>> I think it also has wider applicability as it triggers bugs in btrfs. >>>>>> f2fs passes without issues, but probably benefits from a quick smoke gc >>>>>> test as well. Discussed this with Bart and Daeho (now in cc) before >>>>>> submitting. >>>>>> >>>>>> Using fsstress would be cool, but as far as I can tell it cannot >>>>>> be told to operate at a specific file system usage point, which >>>>>> is a key thing for this test. >>>>> >>>>> As a random test case, if this case can be transformed to use fsstress to cover >>>>> same issues, that would be nice. >>>>> >>>>> But if as a regression test case, it has its particular test coverage, and the >>>>> issue it covered can't be reproduced by fsstress way, then let's work on this >>>>> bash script one. >>>>> >>>>> Any thoughts? >>>> >>>> Yeah, I think bash is preferable for this particular test case. >>>> Bash also makes it easy to hack for people's private uses. >>>> >>>> I use longer versions of this test (increasing overwrite_percentage) >>>> for weekly testing. >>>> >>>> If we need fsstress for reproducing any future gc bug we can add >>>> whats missing to it then. >>>> >>>> Does that make sense? >>>> >>> >>> Hey Zorro, >>> >>> Any remaining concerns for adding this test? I could run it across >>> more file systems(bcachefs could be interesting) and share the results >>> if needed be. >> >> Hi, >> >> I remembered you metioned btrfs fails on this test, and I can reproduce it >> on btrfs [1] with general disk. Have you figured out the reason? I don't >> want to give btrfs a test failure suddently without a proper explanation :) >> If it's a case issue, better to fix it for btrfs. > > > I was surprised to see the failure for brtrfs on a conventional block > device, but have not dug into it. I suspect/assume it's the same root > cause as the issue Johannes is looking into when using a zoned block > device as backing storage. > > I debugged that a bit with Johannes, and noticed that if I manually > kick btrfs rebalancing after each write via sysfs, the test progresses > further (but super slow). > > So *I think* that btrfs needs to: > > * tune the triggering of gc to kick in way before available free space > runs out > * start slowing down / blocking writes when reclaim pressure is high to > avoid premature -ENOSPC:es. Yes both Boris and I are working on different solutions to the GC problem. But apart from that, I have the feeling that using stat to check on the available space is not the best idea, at least for btrfs. > It's a pretty nasty problem, as potentially any write could -ENOSPC > long before the reported available space runs out when a workload > ends up fragmenting the disk and write pressure is high..