Re: Huge lock contention during raid5 build time

Song Liu <song@xxxxxxxxxx> · Thu, 23 Jan 2025 10:01:47 -0800

Hi Anton,

Thanks for the report.

On Thu, Jan 23, 2025 at 5:56 AM Anton Gavriliuk <antosha20xx@xxxxxxxxx> wrote:
>
> Hi
>
> I'm building mdadm raid5 (3+1), based on Intel's NVMe SSD P4600.
>
> Mdadm next version
>
> [root@memverge2 ~]# /home/anton/mdadm/mdadm --version
> mdadm - v4.4-13-ge0df6c4c - 2025-01-17
>
> Maximum performance I saw ~1.4 GB/s.
>
> [root@memverge2 md]# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 nvme0n1[4] nvme2n1[2] nvme3n1[1] nvme4n1[0]
>       4688044032 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
>       [==============>......]  recovery = 71.8%
> (1122726044/1562681344) finish=5.1min speed=1428101K/sec
>       bitmap: 0/12 pages [0KB], 65536KB chunk

Given the rebuild speed of 1.4GB/s, which is pretty fast,  I do
not think this is a regression. Lock contentions in raid5 stack,
including but not limited to the bitmap, is a known issue. We
need major work to make it faster so that we can keep up with
the speed of modern SSDs.

>
> Perf top shows huge spinlock contention
>
> Samples: 180K of event 'cycles:P', 4000 Hz, Event count (approx.):
> 175146370188 lost: 0/0 drop: 0/0
> Overhead  Shared Object                             Symbol
>   38.23%  [kernel]                                  [k]
> native_queued_spin_lock_slowpath
>    8.33%  [kernel]                                  [k] analyse_stripe
>    6.85%  [kernel]                                  [k] ops_run_io
>    3.95%  [kernel]                                  [k] intel_idle_irq
>    3.41%  [kernel]                                  [k] xor_avx_4
>    2.76%  [kernel]                                  [k] handle_stripe
>    2.37%  [kernel]                                  [k] raid5_end_read_request
>    1.97%  [kernel]                                  [k] find_get_stripe

Could you please do a perf-record with '-g' so that we can see
which call paths hit the lock contention? This will help us
understand whether Shushu's bitmap optimization can help.

Thanks,
Song