Dear Anton,
Thank you for your report.
Am 23.01.25 um 14:56 schrieb Anton Gavriliuk:
I'm building mdadm raid5 (3+1), based on Intel's NVMe SSD P4600.
Mdadm next version
[root@memverge2 ~]# /home/anton/mdadm/mdadm --version
mdadm - v4.4-13-ge0df6c4c - 2025-01-17
Maximum performance I saw ~1.4 GB/s.
[root@memverge2 md]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 nvme0n1[4] nvme2n1[2] nvme3n1[1] nvme4n1[0]
4688044032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
[==============>......] recovery = 71.8% (1122726044/1562681344) finish=5.1min speed=1428101K/sec
bitmap: 0/12 pages [0KB], 65536KB chunk
Perf top shows huge spinlock contention
Samples: 180K of event 'cycles:P', 4000 Hz, Event count (approx.):
175146370188 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
38.23% [kernel] [k] native_queued_spin_lock_slowpath
8.33% [kernel] [k] analyse_stripe
6.85% [kernel] [k] ops_run_io
3.95% [kernel] [k] intel_idle_irq
3.41% [kernel] [k] xor_avx_4
2.76% [kernel] [k] handle_stripe
2.37% [kernel] [k] raid5_end_read_request
1.97% [kernel] [k] find_get_stripe
Samples: 1M of event 'cycles:P', 4000 Hz, Event count (approx.): 717038747938
native_queued_spin_lock_slowpath /proc/kcore [Percent: local period]
Percent │ testl %eax,%eax
│ ↑ je 234
│ ↑ jmp 23e
0.00 │248: shrl $0x12, %ecx
│ andl $0x3,%eax
0.00 │ subl $0x1,%ecx
0.00 │ shlq $0x5, %rax
0.00 │ movslq %ecx,%rcx
│ addq $0x36ec0,%rax
0.01 │ addq -0x7b67b2a0(,%rcx,8),%rax
0.02 │ movq %rdx,(%rax)
0.00 │ movl 0x8(%rdx),%eax
0.00 │ testl %eax,%eax
│ ↓ jne 279
62.27 │270: pause
17.49 │ movl 0x8(%rdx),%eax
0.00 │ testl %eax,%eax
1.66 │ ↑ je 270
0.02 │279: movq (%rdx),%rcx
0.00 │ testq %rcx,%rcx
│ ↑ je 202
0.02 │ prefetchw (%rcx)
│ ↑ jmp 202
0.00 │289: movl $0x1,%esi
0.02 │ lock
│ cmpxchgl %esi,(%rbx)
│ ↑ je 129
│ ↑ jmp 20e
Are there any plans to optimize spinlock contention ?
Latest PCI 5.0 NVMe SSDs have tremendous performance characteristics,
but huge spinlock contention just kills that performance.
What Linux version do you test with? A lot of work is going into this in
the last two years. I also remember the patch *[RFC V9] md/bitmap:
Optimize lock contention.* [1]. It’d be great if you could help testing.
Kind regards,
Paul
[1]:
https://lore.kernel.org/linux-raid/DM6PR12MB319444916C454CDBA6FCD358D83D2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/