On Wed, May 08, 2024 at 07:08:01AM +0000, Hans Holmberg wrote: > On 2024-04-17 16:50, Hans Holmberg wrote: > > On 2024-04-17 16:07, Zorro Lang wrote: > >> On Wed, Apr 17, 2024 at 01:21:39PM +0000, Hans Holmberg wrote: > >>> On 2024-04-17 14:43, Zorro Lang wrote: > >>>> On Tue, Apr 16, 2024 at 11:54:37AM -0700, Darrick J. Wong wrote: > >>>>> On Tue, Apr 16, 2024 at 09:07:43AM +0000, Hans Holmberg wrote: > >>>>>> +Zorro (doh!) > >>>>>> > >>>>>> On 2024-04-15 13:23, Hans Holmberg wrote: > >>>>>>> This test stresses garbage collection for file systems by first filling > >>>>>>> up a scratch mount to a specific usage point with files of random size, > >>>>>>> then doing overwrites in parallel with deletes to fragment the backing > >>>>>>> storage, forcing reclaim. > >>>>>>> > >>>>>>> Signed-off-by: Hans Holmberg <hans.holmberg@xxxxxxx> > >>>>>>> --- > >>>>>>> > >>>>>>> Test results in my setup (kernel 6.8.0-rc4+) > >>>>>>> f2fs on zoned nullblk: pass (77s) > >>>>>>> f2fs on conventional nvme ssd: pass (13s) > >>>>>>> btrfs on zoned nublk: fails (-ENOSPC) > >>>>>>> btrfs on conventional nvme ssd: fails (-ENOSPC) > >>>>>>> xfs on conventional nvme ssd: pass (8s) > >>>>>>> > >>>>>>> Johannes(cc) is working on the btrfs ENOSPC issue. > >>>>>>> > >>>>>>> tests/generic/744 | 124 ++++++++++++++++++++++++++++++++++++++++++ > >>>>>>> tests/generic/744.out | 6 ++ > >>>>>>> 2 files changed, 130 insertions(+) > >>>>>>> create mode 100755 tests/generic/744 > >>>>>>> create mode 100644 tests/generic/744.out > >>>>>>> > >>>>>>> diff --git a/tests/generic/744 b/tests/generic/744 > >>>>>>> new file mode 100755 > >>>>>>> index 000000000000..2c7ab76bf8b1 > >>>>>>> --- /dev/null > >>>>>>> +++ b/tests/generic/744 > >>>>>>> @@ -0,0 +1,124 @@ > >>>>>>> +#! /bin/bash > >>>>>>> +# SPDX-License-Identifier: GPL-2.0 > >>>>>>> +# Copyright (c) 2024 Western Digital Corporation. All Rights Reserved. > >>>>>>> +# > >>>>>>> +# FS QA Test No. 744 > >>>>>>> +# > >>>>>>> +# Inspired by btrfs/273 and generic/015 > >>>>>>> +# > >>>>>>> +# This test stresses garbage collection in file systems > >>>>>>> +# by first filling up a scratch mount to a specific usage point with > >>>>>>> +# files of random size, then doing overwrites in parallel with > >>>>>>> +# deletes to fragment the backing zones, forcing reclaim. > >>>>>>> + > >>>>>>> +. ./common/preamble > >>>>>>> +_begin_fstest auto > >>>>>>> + > >>>>>>> +# real QA test starts here > >>>>>>> + > >>>>>>> +_require_scratch > >>>>>>> + > >>>>>>> +# This test requires specific data space usage, skip if we have compression > >>>>>>> +# enabled. > >>>>>>> +_require_no_compress > >>>>>>> + > >>>>>>> +M=$((1024 * 1024)) > >>>>>>> +min_fsz=$((1 * ${M})) > >>>>>>> +max_fsz=$((256 * ${M})) > >>>>>>> +bs=${M} > >>>>>>> +fill_percent=95 > >>>>>>> +overwrite_percentage=20 > >>>>>>> +seq=0 > >>>>>>> + > >>>>>>> +_create_file() { > >>>>>>> + local file_name=${SCRATCH_MNT}/data_$1 > >>>>>>> + local file_sz=$2 > >>>>>>> + local dd_extra=$3 > >>>>>>> + > >>>>>>> + POSIXLY_CORRECT=yes dd if=/dev/zero of=${file_name} \ > >>>>>>> + bs=${bs} count=$(( $file_sz / ${bs} )) \ > >>>>>>> + status=none $dd_extra 2>&1 > >>>>>>> + > >>>>>>> + status=$? > >>>>>>> + if [ $status -ne 0 ]; then > >>>>>>> + echo "Failed writing $file_name" >>$seqres.full > >>>>>>> + exit > >>>>>>> + fi > >>>>>>> +} > >>>>> > >>>>> I wonder, is there a particular reason for doing all these file > >>>>> operations with shell code instead of using fsstress to create and > >>>>> delete files to fill the fs and stress all the zone-gc code? This test > >>>>> reminds me a lot of generic/476 but with more fork()ing. > >>>> > >>>> /me has the same confusion. Can this test cover more things than using > >>>> fsstress (to do reclaim test) ? Or does it uncover some known bugs which > >>>> other cases can't? > >>> > >>> ah, adding some more background is probably useful: > >>> > >>> I've been using this test to stress the crap out the zoned xfs garbage > >>> collection / write throttling implementation for zoned rt subvolumes > >>> support in xfs and it has found a number of issues during implementation > >>> that i did not reproduce by other means. > >>> > >>> I think it also has wider applicability as it triggers bugs in btrfs. > >>> f2fs passes without issues, but probably benefits from a quick smoke gc > >>> test as well. Discussed this with Bart and Daeho (now in cc) before > >>> submitting. > >>> > >>> Using fsstress would be cool, but as far as I can tell it cannot > >>> be told to operate at a specific file system usage point, which > >>> is a key thing for this test. > >> > >> As a random test case, if this case can be transformed to use fsstress to cover > >> same issues, that would be nice. > >> > >> But if as a regression test case, it has its particular test coverage, and the > >> issue it covered can't be reproduced by fsstress way, then let's work on this > >> bash script one. > >> > >> Any thoughts? > > > > Yeah, I think bash is preferable for this particular test case. > > Bash also makes it easy to hack for people's private uses. > > > > I use longer versions of this test (increasing overwrite_percentage) > > for weekly testing. > > > > If we need fsstress for reproducing any future gc bug we can add > > whats missing to it then. > > > > Does that make sense? > > > > Hey Zorro, > > Any remaining concerns for adding this test? I could run it across > more file systems(bcachefs could be interesting) and share the results > if needed be. Hi, I remembered you metioned btrfs fails on this test, and I can reproduce it on btrfs [1] with general disk. Have you figured out the reason? I don't want to give btrfs a test failure suddently without a proper explanation :) If it's a case issue, better to fix it for btrfs. Thanks, Zorro # ./check generic/744 FSTYP -- btrfs PLATFORM -- Linux/x86_64 hp-dl380pg8-01 6.9.0-0.rc5.20240425gite88c4cfcb7b8.47.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 25 14:21:52 UTC 2024 MKFS_OPTIONS -- /dev/sda4 MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda4 /mnt/scratch generic/744 115s ... [failed, exit status 1]- output mismatch (see /root/git/xfstests/results//generic/744.out.bad) --- tests/generic/744.out 2024-05-08 16:11:14.476635417 +0800 +++ /root/git/xfstests/results//generic/744.out.bad 2024-05-08 16:46:03.617194377 +0800 @@ -2,5 +2,4 @@ Starting fillup using direct IO Starting mixed write/delete test using direct IO Starting mixed write/delete test using buffered IO -Syncing -Done, all good +dd: error writing '/mnt/scratch/data_82': No space left on device ... (Run 'diff -u /root/git/xfstests/tests/generic/744.out /root/git/xfstests/results//generic/744.out.bad' to see the entire diff) Ran: generic/744 Failures: generic/744 Failed 1 of 1 tests > > Thanks, > Hans