Hello.
During investigation of flapping performance problem, it was detected
that once a process writes big amount of data in a row, the filesystem
focus on this writing and no other process can perform any IO on this
filesystem.
We have noticed huge %iowait on software raid1 (mdraid) that runs on 2
SSD drives - on every attempt to write more than 1GB.
The issue happens on any server running 6.4.2, 6.4.0, 6.3.3, 6.2.12
kernel. Upon investigating and testing it appeared that server IO
performance can be completely killed with a single command:
#cat /dev/zero > ./removeme
assuming the ~/removeme file resides on rootfs and rootfs is XFS.
While running this, the server becomes so unresponsive that after ~15
seconds it's not even possible to login via ssh!
We did reproduce this on every machine with XFS as rootfs running
mentioned kernels. However, when we converted rootfs from XFS to
EXT4(and btrfs), the problem disappeared - with the same OS, same kernel
binary, same hardware, just using ext4 or btrfs instead of xfs.
Note. During the hang and being unresponsive, SSD drives are writing
data at expected performance. Just all the processes except the writing
one hang.