Huge lock contention during raid5 build time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

I'm building mdadm raid5 (3+1), based on Intel's NVMe SSD P4600.

Mdadm next version

[root@memverge2 ~]# /home/anton/mdadm/mdadm --version
mdadm - v4.4-13-ge0df6c4c - 2025-01-17

Maximum performance I saw ~1.4 GB/s.

[root@memverge2 md]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 nvme0n1[4] nvme2n1[2] nvme3n1[1] nvme4n1[0]
      4688044032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
      [==============>......]  recovery = 71.8%
(1122726044/1562681344) finish=5.1min speed=1428101K/sec
      bitmap: 0/12 pages [0KB], 65536KB chunk

Perf top shows huge spinlock contention

Samples: 180K of event 'cycles:P', 4000 Hz, Event count (approx.):
175146370188 lost: 0/0 drop: 0/0
Overhead  Shared Object                             Symbol
  38.23%  [kernel]                                  [k]
native_queued_spin_lock_slowpath
   8.33%  [kernel]                                  [k] analyse_stripe
   6.85%  [kernel]                                  [k] ops_run_io
   3.95%  [kernel]                                  [k] intel_idle_irq
   3.41%  [kernel]                                  [k] xor_avx_4
   2.76%  [kernel]                                  [k] handle_stripe
   2.37%  [kernel]                                  [k] raid5_end_read_request
   1.97%  [kernel]                                  [k] find_get_stripe

Samples: 1M of event 'cycles:P', 4000 Hz, Event count (approx.): 717038747938
native_queued_spin_lock_slowpath  /proc/kcore [Percent: local period]
Percent │       testl     %eax,%eax
        │     ↑ je        234
        │     ↑ jmp       23e
   0.00 │248:   shrl      $0x12, %ecx
        │       andl      $0x3,%eax
   0.00 │       subl      $0x1,%ecx
   0.00 │       shlq      $0x5, %rax
   0.00 │       movslq    %ecx,%rcx
        │       addq      $0x36ec0,%rax
   0.01 │       addq      -0x7b67b2a0(,%rcx,8),%rax
   0.02 │       movq      %rdx,(%rax)
   0.00 │       movl      0x8(%rdx),%eax
   0.00 │       testl     %eax,%eax
        │     ↓ jne       279
  62.27 │270:   pause
  17.49 │       movl      0x8(%rdx),%eax
   0.00 │       testl     %eax,%eax
   1.66 │     ↑ je        270
   0.02 │279:   movq      (%rdx),%rcx
   0.00 │       testq     %rcx,%rcx
        │     ↑ je        202
   0.02 │       prefetchw (%rcx)
        │     ↑ jmp       202
   0.00 │289:   movl      $0x1,%esi
   0.02 │       lock
        │       cmpxchgl  %esi,(%rbx)
        │     ↑ je        129
        │     ↑ jmp       20e

Are there any plans to optimize spinlock contention ?

Latest PCI 5.0 NVMe SSDs have tremendous performance characteristics,
but huge spinlock contention just kills that performance.

Anton





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux