Re: slow BLKDISCARD on RAID10 md block devices

Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> · Wed, 17 Jul 2019 17:04:27 +0200

On 7/17/19 11:02 AM, Lennert Buytenhek wrote:
Hello!

I've been running into an issue with background fstrim on large xfs
filesystems on RAID10d SSDs taking a lot of time to complete and
starving out other I/O to the filesystem.  There seem to be a few
different issues involved here, but the main one appears to be that
BLKDISCARD on a RAID10 md block device sends many small discard
requests down to the underlying component devices (while this doesn't
seem to be an issue for RAID0 or for RAID1).

It's quite easy to reproduce this with just using in-memory loop
devices, for example by doing:

         cd /dev/shm
         touch loop0
         touch loop1
         touch loop2
         touch loop3
         truncate -s 7681501126656 loop0
         truncate -s 7681501126656 loop1
         truncate -s 7681501126656 loop2
         truncate -s 7681501126656 loop3
         losetup /dev/loop0 loop0
         losetup /dev/loop1 loop1
         losetup /dev/loop2 loop2
         losetup /dev/loop3 loop3

         mdadm --create -n 4 -c 512 -l 0 --assume-clean /dev/md0 /dev/loop[0123]
         time blkdiscard /dev/md0

         mdadm --stop /dev/md0

         mdadm --create -n 4 -c 512 -l 1 --assume-clean /dev/md0 /dev/loop[0123]
         time blkdiscard /dev/md0

         mdadm --stop /dev/md0

         mdadm --create -n 4 -c 512 -l 10 --assume-clean /dev/md0 /dev/loop[0123]
         time blkdiscard /dev/md0

This simulates trimming RAID0/1/10 arrays with 4x7.68TB component
devices, and the blkdiscard completion times are as follows:

         RAID0   0m0.213s
         RAID1   0m2.667s
         RAID10  10m44.814s

IIUC, there is no dedicated function for discard request for raid10 and raid1, raid1
has better performance than raid10 because of the new barrier mechanism or it doesn't
need to translate the address from virtual to physical.

Thanks,
Guoqing