Unacceptably Poor RAID1 Performance with Many CPU Cores

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

This simple experiment reproduces the problem.

Create a RAID1 array using two ramdisks of size 1G:

  mdadm --create /dev/md/test --level=1 --raid-devices=2 /dev/ram0 /dev/ram1

Then use fio to test disk performance (iodepth=64 and numjobs=40;
details at the end of this email).  This is what we get in our machine
(two AMD EPYC 7002 CPUs each with 64 cores and 2TB of RAM; Linux v5.10.0):

Without RAID (writing to /dev/ram0)
READ:  IOPS=14391K BW=56218MiB/s
WRITE: IOPS= 6167K BW=24092MiB/s

RAID1 (writing to /dev/md/test)
READ:  IOPS=  542K BW= 2120MiB/s
WRITE: IOPS=  232K BW=  935MiB/s

The difference, even for reading is huge.

I tried perf to see what is the problem; results are included at the
end of this email.

Any ideas?

We are actually executing hundreds of VMs on our hosts.  The problem
is that when we use RAID1 for our enterprise NVMe disks, the
performance degrades very much compared to using them directly; it
seems we have the same bottleneck as the test described above.

Thanks,
Ali

Perf output:

Samples: 1M of event 'cycles', Event count (approx.): 1158425235997
  Children      Self  Command  Shared Object           Symbol
+   97.98%     0.01%  fio      fio                     [.] fio_libaio_commit
+   97.95%     0.01%  fio      libaio.so.1.0.1         [.] io_submit
+   97.85%     0.01%  fio      [kernel.kallsyms]       [k] __x64_sys_io_submit
-   97.82%     0.01%  fio      [kernel.kallsyms]       [k] io_submit_one
   - 97.81% io_submit_one
      - 54.62% aio_write
         - 54.60% blkdev_write_iter
            - 36.30% blk_finish_plug
               - flush_plug_callbacks
                  - 36.29% raid1_unplug
                     - flush_bio_list
                        - 18.44% submit_bio_noacct
                           - 18.40% brd_submit_bio
                              - 18.13% raid1_end_write_request
                                 - 17.94% raid_end_bio_io
                                    - 17.82% __wake_up_common_lock
                                       + 17.79% _raw_spin_lock_irqsave
                        - 17.79% __wake_up_common_lock
                           + 17.76% _raw_spin_lock_irqsave
            + 18.29% __generic_file_write_iter
      - 43.12% aio_read
         - 43.07% blkdev_read_iter
            - generic_file_read_iter
               - 43.04% blkdev_direct_IO
                  - 42.95% submit_bio_noacct
                     - 42.23% brd_submit_bio
                        - 41.91% raid1_end_read_request
                           - 41.70% raid_end_bio_io
                              - 41.43% __wake_up_common_lock
                                 + 41.36% _raw_spin_lock_irqsave
                     - 0.68% md_submit_bio
                          0.61% md_handle_request
+   94.90%     0.00%  fio      [kernel.kallsyms]       [k] __wake_up_common_lock
+   94.86%     0.22%  fio      [kernel.kallsyms]       [k] _raw_spin_lock_irqsave
+   94.64%    94.64%  fio      [kernel.kallsyms]       [k] native_queued_spin_lock_slowpath
+   79.63%     0.02%  fio      [kernel.kallsyms]       [k] submit_bio_noacct


FIO configuration file:

[global] 
name=random reads and writes
ioengine=libaio 
direct=1
readwrite=randrw 
rwmixread=70 
iodepth=64 
buffered=0 
#filename=/dev/ram0
filename=/dev/dm/test
size=1G
runtime=30 
time_based 
randrepeat=0 
norandommap 
refill_buffers 
ramp_time=10
bs=4k
numjobs=400
group_reporting=1
[job1]




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux