There is a problem with this commit, it causes a CPU#x soft lockup
commit 301867b1c16805aebbc306aafa6ecdc68b73c7e5
Author: Li Nan <linan122@xxxxxxxxxx>
Date: Mon May 15 21:48:05 2023 +0800
md/raid10: check slab-out-of-bounds in md_bitmap_get_counter
Message from syslogd@rhel9 at Apr 19 14:14:55 ...
kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [mdX_resync:6976]
dmesg:
[ 104.245585] CPU: 7 PID: 3588 Comm: mdX_resync Kdump: loaded Not
tainted 6.9.0-rc4-next-20240419 #1
[ 104.245588] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.2-1.fc38 04/01/2014
[ 104.245590] RIP: 0010:_raw_spin_unlock_irq+0x13/0x30
[ 104.245598] Code: 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90
90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 90 90 90 fb 65 ff 0d 95 9f 75
76 <74> 05 c3 cc cc cc cc 0f 1f 44 00 00 c3 cc cc cc cc cc cc cc cc cc
[ 104.245601] RSP: 0018:ffffb2d74a81bbf8 EFLAGS: 00000246
[ 104.245603] RAX: 0000000000000000 RBX: 0000000001000000 RCX:
000000000000000c
[ 104.245604] RDX: 0000000000000000 RSI: 0000000001000000 RDI:
ffff926160ccd200
[ 104.245606] RBP: ffffb2d74a81bcd0 R08: 0000000000000013 R09:
0000000000000000
[ 104.245607] R10: 0000000000000000 R11: ffffb2d74a81bad8 R12:
0000000000000000
[ 104.245608] R13: 0000000000000000 R14: ffff926160ccd200 R15:
ffff926151019000
[ 104.245611] FS: 0000000000000000(0000) GS:ffff9273f9580000(0000)
knlGS:0000000000000000
[ 104.245613] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 104.245614] CR2: 00007f23774d2584 CR3: 0000000104098003 CR4:
0000000000370ef0
[ 104.245616] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 104.245617] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 104.245618] Call Trace:
[ 104.245620] <IRQ>
[ 104.245623] ? watchdog_timer_fn+0x1e3/0x260
[ 104.245630] ? __pfx_watchdog_timer_fn+0x10/0x10
[ 104.245634] ? __hrtimer_run_queues+0x112/0x2a0
[ 104.245638] ? hrtimer_interrupt+0xff/0x240
[ 104.245640] ? sched_clock+0xc/0x30
[ 104.245644] ? __sysvec_apic_timer_interrupt+0x54/0x140
[ 104.245649] ? sysvec_apic_timer_interrupt+0x6c/0x90
[ 104.245652] </IRQ>
[ 104.245653] <TASK>
[ 104.245654] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 104.245659] ? _raw_spin_unlock_irq+0x13/0x30
[ 104.245661] md_bitmap_start_sync+0x6b/0xf0
[ 104.245668] raid10_sync_request+0x25c/0x1b40 [raid10]
[ 104.245676] ? is_mddev_idle+0x132/0x150
[ 104.245680] md_do_sync+0x64b/0x1020
[ 104.245683] ? __pfx_autoremove_wake_function+0x10/0x10
[ 104.245690] md_thread+0xa7/0x170
[ 104.245693] ? __pfx_md_thread+0x10/0x10
[ 104.245696] kthread+0xcf/0x100
[ 104.245700] ? __pfx_kthread+0x10/0x10
[ 104.245704] ret_from_fork+0x30/0x50
[ 104.245707] ? __pfx_kthread+0x10/0x10
[ 104.245710] ret_from_fork_asm+0x1a/0x30
[ 104.245714] </TASK>
When you run the reproducer script below...
#!/bin/sh
vg=t
lv=t
devs="/dev/sd[c-j]"
sz=3G
isz=2G
path=/dev/$vg/$lv
mnt=/mnt/$lv
vgcreate -y $vg $devs
lvcreate --yes --nosync --type raid10 -i 2 -n $lv -L $sz $vg
mkfs.xfs $path
mkdir -p $mnt
mount $path $mnt
df -h
for i in {1..10}
do
lvextend -y -L +$isz -r $path
lvs
done
lvs -a -o +devices
lvchange --syncaction check $path
#lvs -ovgname,lvname,copypercent t/t <-- this cmd to watch