On Thu, Jun 15, 2023 at 4:04 PM Ali Gholami Rudi <aligrudi@xxxxxxxxx> wrote: > > Hi, > > This simple experiment reproduces the problem. > > Create a RAID1 array using two ramdisks of size 1G: > > mdadm --create /dev/md/test --level=1 --raid-devices=2 /dev/ram0 /dev/ram1 > > Then use fio to test disk performance (iodepth=64 and numjobs=40; > details at the end of this email). This is what we get in our machine > (two AMD EPYC 7002 CPUs each with 64 cores and 2TB of RAM; Linux v5.10.0): > > Without RAID (writing to /dev/ram0) > READ: IOPS=14391K BW=56218MiB/s > WRITE: IOPS= 6167K BW=24092MiB/s > > RAID1 (writing to /dev/md/test) > READ: IOPS= 542K BW= 2120MiB/s > WRITE: IOPS= 232K BW= 935MiB/s > > The difference, even for reading is huge. > > I tried perf to see what is the problem; results are included at the > end of this email. > > Any ideas? Hello Ali Because it can be reproduced easily in your environment. Can you try with the latest upstream kernel? If the problem doesn't exist with latest upstream kernel. You can use git bisect to find which patch can fix this problem. > > We are actually executing hundreds of VMs on our hosts. The problem > is that when we use RAID1 for our enterprise NVMe disks, the > performance degrades very much compared to using them directly; it > seems we have the same bottleneck as the test described above. So those hundreds VMs run on the raid1, and the raid1 is created with nvme disks. What's /proc/mdstat? Regards Xiao > > Thanks, > Ali > > Perf output: > > Samples: 1M of event 'cycles', Event count (approx.): 1158425235997 > Children Self Command Shared Object Symbol > + 97.98% 0.01% fio fio [.] fio_libaio_commit > + 97.95% 0.01% fio libaio.so.1.0.1 [.] io_submit > + 97.85% 0.01% fio [kernel.kallsyms] [k] __x64_sys_io_submit > - 97.82% 0.01% fio [kernel.kallsyms] [k] io_submit_one > - 97.81% io_submit_one > - 54.62% aio_write > - 54.60% blkdev_write_iter > - 36.30% blk_finish_plug > - flush_plug_callbacks > - 36.29% raid1_unplug > - flush_bio_list > - 18.44% submit_bio_noacct > - 18.40% brd_submit_bio > - 18.13% raid1_end_write_request > - 17.94% raid_end_bio_io > - 17.82% __wake_up_common_lock > + 17.79% _raw_spin_lock_irqsave > - 17.79% __wake_up_common_lock > + 17.76% _raw_spin_lock_irqsave > + 18.29% __generic_file_write_iter > - 43.12% aio_read > - 43.07% blkdev_read_iter > - generic_file_read_iter > - 43.04% blkdev_direct_IO > - 42.95% submit_bio_noacct > - 42.23% brd_submit_bio > - 41.91% raid1_end_read_request > - 41.70% raid_end_bio_io > - 41.43% __wake_up_common_lock > + 41.36% _raw_spin_lock_irqsave > - 0.68% md_submit_bio > 0.61% md_handle_request > + 94.90% 0.00% fio [kernel.kallsyms] [k] __wake_up_common_lock > + 94.86% 0.22% fio [kernel.kallsyms] [k] _raw_spin_lock_irqsave > + 94.64% 94.64% fio [kernel.kallsyms] [k] native_queued_spin_lock_slowpath > + 79.63% 0.02% fio [kernel.kallsyms] [k] submit_bio_noacct > > > FIO configuration file: > > [global] > name=random reads and writes > ioengine=libaio > direct=1 > readwrite=randrw > rwmixread=70 > iodepth=64 > buffered=0 > #filename=/dev/ram0 > filename=/dev/dm/test > size=1G > runtime=30 > time_based > randrepeat=0 > norandommap > refill_buffers > ramp_time=10 > bs=4k > numjobs=400 > group_reporting=1 > [job1] >