Experiencing md raid5 hang and CPU lockup on kernel v6.11

Haris Iqbal <haris.iqbal@xxxxxxxxx> · Tue, 5 Nov 2024 10:57:37 +0100

Hi,

I am running fio over a RDMA block device. The server side of this
mapping is an md-raid0 device, created over 3 md-raid5 devices.
The md-raid5 devices each are created over 8 block devices. Below is
how the raid configuration looks (md400, md300, md301 and md302 are
relevant for this discussion here).

$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md400 : active raid0 md300[0] md302[2] md301[1]
      19688371968 blocks super 1.2 128k chunks

md302 : active raid5 sds[0] sdz[7] sdy[6] sdx[5] sdw[4] sdv[3] sdu[2] sdt[1]
      6562922800 blocks super 1.2 level 5, 16k chunk, algorithm 2
[8/8] [UUUUUUUU]
      bitmap: 0/1 pages [0KB], 524288KB chunk

md301 : active raid5 sdk[0] sdr[7] sdq[6] sdp[5] sdo[4] sdn[3] sdm[2] sdl[1]
      6562922800 blocks super 1.2 level 5, 16k chunk, algorithm 2
[8/8] [UUUUUUUU]
      bitmap: 0/1 pages [0KB], 524288KB chunk

md300 : active raid5 sda[0] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1]
      6562922800 blocks super 1.2 level 5, 16k chunk, algorithm 2
[8/8] [UUUUUUUU]
      bitmap: 0/1 pages [0KB], 524288KB chunk

md126 : active raid1 sdi3[0] sdj3[1]
      117096448 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid1 sdi2[0] sdj2[1]
      117096448 blocks super 1.2 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

The RDMA mapping is through the RNBD/RTRS kernel module, in which RTRS
provides the RDMA transport, and RNBD creates the block device layer.
The md400 device is mapped and on the client side, and the fio profile
I run is as following,

$ cat fio_single.ini
[global]
description=Emulation of Storage Server Access Pattern
bssplit=512/20:1k/16:2k/9:4k/12:8k/19:16k/10:32k/8:64k/4
fadvise_hint=0
rw=randrw:2
direct=1
random_distribution=zipf:1.2
time_based=1
runtime=60
ramp_time=1
ioengine=libaio
iodepth=128
iodepth_batch_submit=128
iodepth_batch_complete_min=1
iodepth_batch_complete_max=128
numjobs=10
group_reporting

[job1]
filename=/dev/rnbd0
do_verify=1

The hang is easily reproducible, and I hit is almost every time under
the first 30 seconds of the fio.

We see 2 different types of stack traces and lockups in dmesg, and
when we dump stack for every CPU. I have shared both of them in
separate files.

(PS: We have done some tests on the v6.1 kernel also, and we
experience the same hang. Have not tested any other kernel version.)

Regards
-Haris
Attachment:
sysrq_l_6_11_2

Description: Binary data
Attachment:
sysrq_l_6_11_1

Description: Binary data