在 2024/5/13 02:26, Johannes Thumshirn 写道:
[ +CC Boris ]
[...]
I was surprised to see the failure for brtrfs on a conventional block
device, but have not dug into it. I suspect/assume it's the same root
cause as the issue Johannes is looking into when using a zoned block
device as backing storage.
I debugged that a bit with Johannes, and noticed that if I manually
kick btrfs rebalancing after each write via sysfs, the test progresses
further (but super slow).
So *I think* that btrfs needs to:
* tune the triggering of gc to kick in way before available free space
runs out
* start slowing down / blocking writes when reclaim pressure is high to
avoid premature -ENOSPC:es.
Yes both Boris and I are working on different solutions to the GC
problem. But apart from that, I have the feeling that using stat to
check on the available space is not the best idea.
Although my previous workaround (fill to 100% then deleting 5%) is not
going to be feasible for zoned devices, what about two-run solution below?
- The first run to fill the whole fs until ENOSPC
Then calculate how many bytes we have really written. (du?)
- Recreate the fs and fill to 95% of above number and start the test
But with this workaround, I'm not 100% if this is a good idea for all
filesystems.
AFAIK ext4/xfs sometimes can under-report the available space (aka,
reporting no available bytes, but can still write new data).
If we always go ENOSPC to calculate the real available space, it may
cause too much pressure.
And it may be a good idea for us btrfs guys to implement a similar
under-reporting available space behavior?
Thanks,
Qu
It's a pretty nasty problem, as potentially any write could -ENOSPC
long before the reported available space runs out when a workload
ends up fragmenting the disk and write pressure is high..