TL;DR; The more drives are in the array, the slower it is.
I found this when I was debugging NVME array issues, but I was able to
reduce it to the case with with ram-disks, so no special hardware involved.
It can be reproduced on any machine with reasonable amount of memory
(16-32Gb). I checked it on Ubuntu Focal (5.4.0-65-generic), Groovy
(5.8.0-41-generic), Centos 8
For a very fast underlying device (NVME or brd: block ram disk) the more
drives are added to the array, the slower it become, and it's irrelevant
of the array type: raid1, 0, 10.
The speed lost is very high: a single ram-disk is yielding 200-250
kIOPS, the raid0 of 4 ram disks yields less than 150kIOPS, and 100 RAM
disks in raid0 yields mere 30 kIOPS.
The script to reproduce the issue:
export NUM=100 # or any other number between 1 and 1920.
modprobe brd rd_nr=${NUM} rd_size=10000
mdadm --create /dev/md42 -n ${NUM} -l raid0 --assume-clean /dev/ram* -e
1.2 --force
fio --name test --ioengine=libaio --blocksize=4k --iodepth=${NUM}
--rw=randwrite --time_based --runtime=5s --ramp_time=1s --fsync=1
--direct=1 --disk_util=0 --filename=/dev/md42
mdadm --stop /dev/md42
rmmod brd
My laptop (baseline for ram0 without raid - 197k IOPS)
disks in array - IOPS
1 - 108k
2 - 103k
4 - 99.0k
8 - 89.8k
16 - 75.5k
32 - 52.6k
64 - 34.7k
128 - 20.5k
256 - 11.3k
512 - 5.9k
1024 - 3.4k
1920 - 1.5k
(without --assume-clean it's the same, using --iodepth=1 intead of
${NUM} gives the same).
I feel that having 1.5k IOPS on RAID0 consisting of 1920 ram-disks is a
bit slow.
P.S. The real issue is with NVME devices in 10 drives per array, they
shows even faster decline, but ram-disk make it more visual.