On Thu, Jul 11, 2024 at 01:23:12PM +0200, Andre Noll wrote: > On Thu, Jul 11, 09:12, Dave Chinner wrote > > > > Of course it’s not reproducible, but any insight how to debug this next time > > > is much welcomed. > > > > Probably not a lot you can do short of reconfiguring your RAID6 > > storage devices to handle small IOs better. However, in general, > > RAID6 /always sucks/ for small IOs, and the only way to fix this > > problem is to use high performance SSDs to give you a massive excess > > of write bandwidth to burn on write amplification.... > > FWIW, our approach to mitigate the write amplification suckage of large > HDD-backed raid6 arrays for small I/Os is to set up a bcache device > by combining such arrays with two small SSDs (configured as raid1). Which is effectively the same sort of setup as having a NVRAM cache in front of the RAID6 volume (i.e. hardware RAID controller). That can work if the cache is large enough to soak up bursts of small writes followed by enough idle time for the back end RAID6 device to do all it's RMW cycles to clean the cache. However, if the cache fills up with small writes, then slowdowns and IO latencies get even worse than if you are just using a plain RAID6 device. Think about a cache with several million cached random 4kB writes, and how long that will take to flush to the RAID6 volume that might only be able to do 100 IOPS. It's not uncommon to see such setups stall for *hours* in situations like this. We get stalls like this on hardware RAID reported to us at least a couple of times a year. There's little we can do about it because writeback caching mode is being used to boost burst performance and there's not enough idle time between the bursts to drain the cache. Yes, they could use write-through caching, but that doesn't improve the performance of bursty workloads. Hence deploying a fast cache in front of a very slow drive is not exactly straight forward. Making it work reliably requires awareness of workload IO patterns. Special attention needs to be paid to the amount of idle time. If there isn't enough idle time, the cache will eventually stall and it will take much longer to recover than a stall on a plain RAID volume. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx