Re: [PATCH] generic: add gc stress test

Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx> · Wed, 8 May 2024 11:02:17 +0000

On 08.05.24 11:28, Qu Wenruo wrote:
> 
> 
> 在 2024/5/8 18:21, Zorro Lang 写道:
> [...]
>>>>
>>>
>>> Hey Zorro,
>>>
>>> Any remaining concerns for adding this test? I could run it across
>>> more file systems(bcachefs could be interesting) and share the results
>>> if needed be.
>>
>> Hi,
>>
>> I remembered you metioned btrfs fails on this test, and I can reproduce it
>> on btrfs [1] with general disk. Have you figured out the reason? I don't
>> want to give btrfs a test failure suddently without a proper explanation :)
>> If it's a case issue, better to fix it for btrfs.
>>
>> Thanks,
>> Zorro
>>
>> # ./check generic/744
>> FSTYP         -- btrfs
>> PLATFORM      -- Linux/x86_64 hp-dl380pg8-01 6.9.0-0.rc5.20240425gite88c4cfcb7b8.47.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 25 14:21:52 UTC 2024
>> MKFS_OPTIONS  -- /dev/sda4
>> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda4 /mnt/scratch
>>
>> generic/744 115s ... [failed, exit status 1]- output mismatch (see /root/git/xfstests/results//generic/744.out.bad)
>>       --- tests/generic/744.out   2024-05-08 16:11:14.476635417 +0800
>>       +++ /root/git/xfstests/results//generic/744.out.bad 2024-05-08 16:46:03.617194377 +0800
>>       @@ -2,5 +2,4 @@
>>        Starting fillup using direct IO
>>        Starting mixed write/delete test using direct IO
>>        Starting mixed write/delete test using buffered IO
>>       -Syncing
>>       -Done, all good
>>       +dd: error writing '/mnt/scratch/data_82': No space left on device
> 
> [POSSIBLE CAUSE]
> Not an expert on zoned support, but even with the 95% fill rate setup,
> the test case still go fully filled btrfs data, thus no more data can be
> written.

Yes I /think/ Zorro's report above is with a regular (i.e. non-zoned) setup.

> My guess is, the available space has taken some metadata space into
> consideration, thus at the end of the final available bytes of data
> space, the `stat -f -c '%a'` still reports some value larger than 5%.
> 
> But as long as the data space is full filled up, btrfs notice that there
> is no way to allocate more data, thus reports its available bytes as 0.
> 
> This means, the available space report is always beyond 5%, then
> suddenly dropped to 0, causing the test script to fail.
> 
> Unfortunately I do not have any good idea that can easily solve the
> problem. Due to the nature of dynamic block groups allocation, the
> available/free space reporting is always not that reliable.
> 
> [WORKAROUND?]
> I'm just wondering if it's possible that, can we fill up the fs to 100%
> (hitting ENOSPC), then just remove 5% of all the files to emulate 95%
> filled up fs?
> 
> By this, it can be a more accurate way to emulate 95% used data space,
> without relying on the fs specific available space reporting.

This won't work on zoned though. If we fill to 100% and then remove 5% 
we'd still need to run balance/gc to really free up that 5%.

And there comes a 2nd problem, for zoned we need to reserve at least one 
block-group as a relocation target (I did send an RFC patch for that a 
while ago [1]).

[1] 
https://lore.kernel.org/linux-btrfs/1480374e3f65371d4b857fb45a3fd9f6a5fa4a25.1713357984.git.jth@xxxxxxxxxx/