On Tue, Jul 11, 2023 at 05:31:13PM +0200, Eugene K. wrote: > Hello. > > During investigation of flapping performance problem, it was detected that > once a process writes big amount of data in a row, the filesystem focus on > this writing and no other process can perform any IO on this filesystem. > > We have noticed huge %iowait on software raid1 (mdraid) that runs on 2 SSD > drives - on every attempt to write more than 1GB. > > The issue happens on any server running 6.4.2, 6.4.0, 6.3.3, 6.2.12 kernel. > Upon investigating and testing it appeared that server IO performance can be > completely killed with a single command: > > #cat /dev/zero > ./removeme > > assuming the ~/removeme file resides on rootfs and rootfs is XFS. > > While running this, the server becomes so unresponsive that after ~15 > seconds it's not even possible to login via ssh! > > We did reproduce this on every machine with XFS as rootfs running mentioned > kernels. However, when we converted rootfs from XFS to EXT4(and btrfs), the > problem disappeared - with the same OS, same kernel binary, same hardware, > just using ext4 or btrfs instead of xfs. So use ext4. --D > Note. During the hang and being unresponsive, SSD drives are writing data at > expected performance. Just all the processes except the writing one hang. > >