Re: [PATCH] generic: add gc stress test

Hans Holmberg <Hans.Holmberg@xxxxxxx> · Tue, 14 May 2024 08:02:00 +0000

On 2024-05-13 09:33, Qu Wenruo wrote:
> 
> 
> 在 2024/5/13 02:26, Johannes Thumshirn 写道:
>> [ +CC Boris ]
> [...]
>>> I was surprised to see the failure for brtrfs on a conventional block
>>> device, but have not dug into it. I suspect/assume it's the same root
>>> cause as the issue Johannes is looking into when using a zoned block
>>> device as backing storage.
>>>
>>> I debugged that a bit with Johannes, and noticed that if I manually
>>> kick btrfs rebalancing after each write via sysfs, the test progresses
>>> further (but super slow).
>>>
>>> So *I think* that btrfs needs to:
>>>
>>> * tune the triggering of gc to kick in way before available free space
>>>       runs out
>>> * start slowing down / blocking writes when reclaim pressure is high to
>>>       avoid premature -ENOSPC:es.
>>
>> Yes both Boris and I are working on different solutions to the GC
>> problem. But apart from that, I have the feeling that using stat to
>> check on the available space is not the best idea.
> 
> Although my previous workaround (fill to 100% then deleting 5%) is not
> going to be feasible for zoned devices, what about two-run solution below?
> 
> - The first run to fill the whole fs until ENOSPC
>     Then calculate how many bytes we have really written. (du?)
> 
> - Recreate the fs and fill to 95% of above number and start the test
> 
> But with this workaround, I'm not 100% if this is a good idea for all
> filesystems.
> 
> AFAIK ext4/xfs sometimes can under-report the available space (aka,
> reporting no available bytes, but can still write new data).
> 
> If we always go ENOSPC to calculate the real available space, it may
> cause too much pressure.
> 
> And it may be a good idea for us btrfs guys to implement a similar
> under-reporting available space behavior?

My thoughts on this:

This test is not designed for testing how much data we can write to
a file system, so it would be fine to decrease fill_percent to allow
for a bit of fuzzyness. It would make the test longer to run though.

BUT that does not work around the btrfs issue(s). When testing around, I
tried decreasing fill_percent to something like 70 and btrfs still
-ENOSPC:ed. It's the fragmentation and the fact that reclaim does not
happen fast enough that causes writes to fail (I believe, johannes &
boris knows better).

Also, how are users supposed to know how much data they can store if 
stat does not tell them that with some degree of certainty?

Space accounting for full copy-on-write file systems is a Hard
Problem (tm), especially if metadata is also fully copy on write, but
that should not stop us from trying to do it right :)

Thanks,
Hans

> 
> Thanks,
> Qu
>>
>>> It's a pretty nasty problem, as potentially any write could -ENOSPC
>>> long before the reported available space runs out when a workload
>>> ends up fragmenting the disk and write pressure is high..
>>
>>
>