> What Linux version do you test with? Currently on Centos Stream 10. [root@memverge2 ~]# uname -r 6.12.0-43.el10.x86_64 I can switch to the Rocky 9.5 if required. > I also remember the patch *[RFC V9] md/bitmap: > Optimize lock contention.* [1]. It’d be great if you could help testing. Ohh, I have thought that the patch already included in the mdadm version (mdadm - v4.4-13-ge0df6c4c - 2025-01-17) If the patch is not yet applied to the latest mdadm version, how exactly to do that ? I'm not a Linux developer, but I would be glad to test that patch. Anyway, I believe that mdadm should be optimized for the latest PCIe gen 5.0 NVMe SSDs. Anton чт, 23 янв. 2025 г. в 16:49, Paul Menzel <pmenzel@xxxxxxxxxxxxx>: > > Dear Anton, > > > Thank you for your report. > > Am 23.01.25 um 14:56 schrieb Anton Gavriliuk: > > > I'm building mdadm raid5 (3+1), based on Intel's NVMe SSD P4600. > > > > Mdadm next version > > > > [root@memverge2 ~]# /home/anton/mdadm/mdadm --version > > mdadm - v4.4-13-ge0df6c4c - 2025-01-17 > > > > Maximum performance I saw ~1.4 GB/s. > > > > [root@memverge2 md]# cat /proc/mdstat > > Personalities : [raid6] [raid5] [raid4] > > md0 : active raid5 nvme0n1[4] nvme2n1[2] nvme3n1[1] nvme4n1[0] > > 4688044032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_] > > [==============>......] recovery = 71.8% (1122726044/1562681344) finish=5.1min speed=1428101K/sec > > bitmap: 0/12 pages [0KB], 65536KB chunk > > > > Perf top shows huge spinlock contention > > > > Samples: 180K of event 'cycles:P', 4000 Hz, Event count (approx.): > > 175146370188 lost: 0/0 drop: 0/0 > > Overhead Shared Object Symbol > > 38.23% [kernel] [k] native_queued_spin_lock_slowpath > > 8.33% [kernel] [k] analyse_stripe > > 6.85% [kernel] [k] ops_run_io > > 3.95% [kernel] [k] intel_idle_irq > > 3.41% [kernel] [k] xor_avx_4 > > 2.76% [kernel] [k] handle_stripe > > 2.37% [kernel] [k] raid5_end_read_request > > 1.97% [kernel] [k] find_get_stripe > > > > Samples: 1M of event 'cycles:P', 4000 Hz, Event count (approx.): 717038747938 > > native_queued_spin_lock_slowpath /proc/kcore [Percent: local period] > > Percent │ testl %eax,%eax > > │ ↑ je 234 > > │ ↑ jmp 23e > > 0.00 │248: shrl $0x12, %ecx > > │ andl $0x3,%eax > > 0.00 │ subl $0x1,%ecx > > 0.00 │ shlq $0x5, %rax > > 0.00 │ movslq %ecx,%rcx > > │ addq $0x36ec0,%rax > > 0.01 │ addq -0x7b67b2a0(,%rcx,8),%rax > > 0.02 │ movq %rdx,(%rax) > > 0.00 │ movl 0x8(%rdx),%eax > > 0.00 │ testl %eax,%eax > > │ ↓ jne 279 > > 62.27 │270: pause > > 17.49 │ movl 0x8(%rdx),%eax > > 0.00 │ testl %eax,%eax > > 1.66 │ ↑ je 270 > > 0.02 │279: movq (%rdx),%rcx > > 0.00 │ testq %rcx,%rcx > > │ ↑ je 202 > > 0.02 │ prefetchw (%rcx) > > │ ↑ jmp 202 > > 0.00 │289: movl $0x1,%esi > > 0.02 │ lock > > │ cmpxchgl %esi,(%rbx) > > │ ↑ je 129 > > │ ↑ jmp 20e > > > > Are there any plans to optimize spinlock contention ? > > > > Latest PCI 5.0 NVMe SSDs have tremendous performance characteristics, > > but huge spinlock contention just kills that performance. > > What Linux version do you test with? A lot of work is going into this in > the last two years. I also remember the patch *[RFC V9] md/bitmap: > Optimize lock contention.* [1]. It’d be great if you could help testing. > > > Kind regards, > > Paul > > > [1]: > https://lore.kernel.org/linux-raid/DM6PR12MB319444916C454CDBA6FCD358D83D2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/