> We need major work to make it faster so that we can keep up with > the speed of modern SSDs. Glad to know that this in your roadmap. This is very important for storage server solutions, when you can add ten's NVMe SSDs Gen 4/5 in 2U server. I'm not a developer, but I can assist you in the testing as much as required. > Could you please do a perf-record with '-g' so that we can see > which call paths hit the lock contention? This will help us > understand whether Shushu's bitmap optimization can help. default raid5 build performance [root@memverge2 ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 nvme0n1[4] nvme2n1[2] nvme3n1[1] nvme4n1[0] 4688044032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_] [>....................] recovery = 0.3% (5601408/1562681344) finish=125.0min speed=207459K/sec bitmap: 0/12 pages [0KB], 65536KB chunk after set [root@memverge2 md]# echo 8 > group_thread_cnt [root@memverge2 md]# echo 3600000 > sync_speed_max [root@memverge2 ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 nvme0n1[4] nvme2n1[2] nvme3n1[1] nvme4n1[0] 4688044032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_] [=>...................] recovery = 7.9% (124671408/1562681344) finish=16.6min speed=1435737K/sec bitmap: 0/12 pages [0KB], 65536KB chunk perf.data.gz attached. Anton чт, 23 янв. 2025 г. в 20:01, Song Liu <song@xxxxxxxxxx>: > > Hi Anton, > > Thanks for the report. > > On Thu, Jan 23, 2025 at 5:56 AM Anton Gavriliuk <antosha20xx@xxxxxxxxx> wrote: > > > > Hi > > > > I'm building mdadm raid5 (3+1), based on Intel's NVMe SSD P4600. > > > > Mdadm next version > > > > [root@memverge2 ~]# /home/anton/mdadm/mdadm --version > > mdadm - v4.4-13-ge0df6c4c - 2025-01-17 > > > > Maximum performance I saw ~1.4 GB/s. > > > > [root@memverge2 md]# cat /proc/mdstat > > Personalities : [raid6] [raid5] [raid4] > > md0 : active raid5 nvme0n1[4] nvme2n1[2] nvme3n1[1] nvme4n1[0] > > 4688044032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_] > > [==============>......] recovery = 71.8% > > (1122726044/1562681344) finish=5.1min speed=1428101K/sec > > bitmap: 0/12 pages [0KB], 65536KB chunk > > Given the rebuild speed of 1.4GB/s, which is pretty fast, I do > not think this is a regression. Lock contentions in raid5 stack, > including but not limited to the bitmap, is a known issue. We > need major work to make it faster so that we can keep up with > the speed of modern SSDs. > > > > > Perf top shows huge spinlock contention > > > > Samples: 180K of event 'cycles:P', 4000 Hz, Event count (approx.): > > 175146370188 lost: 0/0 drop: 0/0 > > Overhead Shared Object Symbol > > 38.23% [kernel] [k] > > native_queued_spin_lock_slowpath > > 8.33% [kernel] [k] analyse_stripe > > 6.85% [kernel] [k] ops_run_io > > 3.95% [kernel] [k] intel_idle_irq > > 3.41% [kernel] [k] xor_avx_4 > > 2.76% [kernel] [k] handle_stripe > > 2.37% [kernel] [k] raid5_end_read_request > > 1.97% [kernel] [k] find_get_stripe > > Could you please do a perf-record with '-g' so that we can see > which call paths hit the lock contention? This will help us > understand whether Shushu's bitmap optimization can help. > > Thanks, > Song
Attachment:
perf.data.gz
Description: GNU Zip compressed data