Hi, This simple experiment reproduces the problem. Create a RAID1 array using two ramdisks of size 1G: mdadm --create /dev/md/test --level=1 --raid-devices=2 /dev/ram0 /dev/ram1 Then use fio to test disk performance (iodepth=64 and numjobs=40; details at the end of this email). This is what we get in our machine (two AMD EPYC 7002 CPUs each with 64 cores and 2TB of RAM; Linux v5.10.0): Without RAID (writing to /dev/ram0) READ: IOPS=14391K BW=56218MiB/s WRITE: IOPS= 6167K BW=24092MiB/s RAID1 (writing to /dev/md/test) READ: IOPS= 542K BW= 2120MiB/s WRITE: IOPS= 232K BW= 935MiB/s The difference, even for reading is huge. I tried perf to see what is the problem; results are included at the end of this email. Any ideas? We are actually executing hundreds of VMs on our hosts. The problem is that when we use RAID1 for our enterprise NVMe disks, the performance degrades very much compared to using them directly; it seems we have the same bottleneck as the test described above. Thanks, Ali Perf output: Samples: 1M of event 'cycles', Event count (approx.): 1158425235997 Children Self Command Shared Object Symbol + 97.98% 0.01% fio fio [.] fio_libaio_commit + 97.95% 0.01% fio libaio.so.1.0.1 [.] io_submit + 97.85% 0.01% fio [kernel.kallsyms] [k] __x64_sys_io_submit - 97.82% 0.01% fio [kernel.kallsyms] [k] io_submit_one - 97.81% io_submit_one - 54.62% aio_write - 54.60% blkdev_write_iter - 36.30% blk_finish_plug - flush_plug_callbacks - 36.29% raid1_unplug - flush_bio_list - 18.44% submit_bio_noacct - 18.40% brd_submit_bio - 18.13% raid1_end_write_request - 17.94% raid_end_bio_io - 17.82% __wake_up_common_lock + 17.79% _raw_spin_lock_irqsave - 17.79% __wake_up_common_lock + 17.76% _raw_spin_lock_irqsave + 18.29% __generic_file_write_iter - 43.12% aio_read - 43.07% blkdev_read_iter - generic_file_read_iter - 43.04% blkdev_direct_IO - 42.95% submit_bio_noacct - 42.23% brd_submit_bio - 41.91% raid1_end_read_request - 41.70% raid_end_bio_io - 41.43% __wake_up_common_lock + 41.36% _raw_spin_lock_irqsave - 0.68% md_submit_bio 0.61% md_handle_request + 94.90% 0.00% fio [kernel.kallsyms] [k] __wake_up_common_lock + 94.86% 0.22% fio [kernel.kallsyms] [k] _raw_spin_lock_irqsave + 94.64% 94.64% fio [kernel.kallsyms] [k] native_queued_spin_lock_slowpath + 79.63% 0.02% fio [kernel.kallsyms] [k] submit_bio_noacct FIO configuration file: [global] name=random reads and writes ioengine=libaio direct=1 readwrite=randrw rwmixread=70 iodepth=64 buffered=0 #filename=/dev/ram0 filename=/dev/dm/test size=1G runtime=30 time_based randrepeat=0 norandommap refill_buffers ramp_time=10 bs=4k numjobs=400 group_reporting=1 [job1]