On 7/17/19 11:02 AM, Lennert Buytenhek wrote:
Hello! I've been running into an issue with background fstrim on large xfs filesystems on RAID10d SSDs taking a lot of time to complete and starving out other I/O to the filesystem. There seem to be a few different issues involved here, but the main one appears to be that BLKDISCARD on a RAID10 md block device sends many small discard requests down to the underlying component devices (while this doesn't seem to be an issue for RAID0 or for RAID1). It's quite easy to reproduce this with just using in-memory loop devices, for example by doing: cd /dev/shm touch loop0 touch loop1 touch loop2 touch loop3 truncate -s 7681501126656 loop0 truncate -s 7681501126656 loop1 truncate -s 7681501126656 loop2 truncate -s 7681501126656 loop3 losetup /dev/loop0 loop0 losetup /dev/loop1 loop1 losetup /dev/loop2 loop2 losetup /dev/loop3 loop3 mdadm --create -n 4 -c 512 -l 0 --assume-clean /dev/md0 /dev/loop[0123] time blkdiscard /dev/md0 mdadm --stop /dev/md0 mdadm --create -n 4 -c 512 -l 1 --assume-clean /dev/md0 /dev/loop[0123] time blkdiscard /dev/md0 mdadm --stop /dev/md0 mdadm --create -n 4 -c 512 -l 10 --assume-clean /dev/md0 /dev/loop[0123] time blkdiscard /dev/md0 This simulates trimming RAID0/1/10 arrays with 4x7.68TB component devices, and the blkdiscard completion times are as follows: RAID0 0m0.213s RAID1 0m2.667s RAID10 10m44.814s
IIUC, there is no dedicated function for discard request for raid10 and raid1, raid1 has better performance than raid10 because of the new barrier mechanism or it doesn't need to translate the address from virtual to physical. Thanks, Guoqing