Hi, sorry for the delay, I had to give away the nodes and we had a week of teambuilding and company party, so for the past week I only managed to hack away stripping debug symbols, get another node and set it up. Experiments below are based off of vanilla 6.9.8 kernel *without* your patch. On Mon, 2024-07-15 at 09:56 +0800, Yu Kuai wrote: > Line number will be helpful. So, after tinkering with building scripts I managed to build modules with debug symbols (not the kernel itself but should be good enough), but for some reason kernel doesn't show line numbers in stacktraces. No idea what could be causing it, so I had to decode line numbers manually, below is an output where I inserted line numbers for raid456 manually after decoding them with `gdb`. […] [ 1677.293366] <TASK> [ 1677.293661] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 1677.293972] ? _raw_spin_unlock_irq+0x10/0x30 [ 1677.294276] ? _raw_spin_unlock_irq+0xa/0x30 [ 1677.294586] raid5d at drivers/md/raid5.c:6572 [ 1677.294910] md_thread+0xc1/0x170 [ 1677.295228] ? __pfx_autoremove_wake_function+0x10/0x10 [ 1677.295545] ? __pfx_md_thread+0x10/0x10 [ 1677.295870] kthread+0xff/0x130 [ 1677.296189] ? __pfx_kthread+0x10/0x10 [ 1677.296498] ret_from_fork+0x30/0x50 [ 1677.296810] ? __pfx_kthread+0x10/0x10 [ 1677.297112] ret_from_fork_asm+0x1a/0x30 [ 1677.297424] </TASK> […] [ 1705.296253] <TASK> [ 1705.296554] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 1705.296864] ? _raw_spin_unlock_irq+0x10/0x30 [ 1705.297172] ? _raw_spin_unlock_irq+0xa/0x30 [ 1677.294586] raid5d at drivers/md/raid5.c:6597 [ 1705.297794] md_thread+0xc1/0x170 [ 1705.298099] ? __pfx_autoremove_wake_function+0x10/0x10 [ 1705.298409] ? __pfx_md_thread+0x10/0x10 [ 1705.298714] kthread+0xff/0x130 [ 1705.299022] ? __pfx_kthread+0x10/0x10 [ 1705.299333] ret_from_fork+0x30/0x50 [ 1705.299641] ? __pfx_kthread+0x10/0x10 [ 1705.299947] ret_from_fork_asm+0x1a/0x30 [ 1705.300257] </TASK> […] [ 1733.296255] <TASK> [ 1733.296556] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 1733.296862] ? _raw_spin_unlock_irq+0x10/0x30 [ 1733.297170] ? _raw_spin_unlock_irq+0xa/0x30 [ 1677.294586] raid5d at drivers/md/raid5.c:6572 [ 1733.297792] md_thread+0xc1/0x170 [ 1733.298096] ? __pfx_autoremove_wake_function+0x10/0x10 [ 1733.298403] ? __pfx_md_thread+0x10/0x10 [ 1733.298711] kthread+0xff/0x130 [ 1733.299018] ? __pfx_kthread+0x10/0x10 [ 1733.299330] ret_from_fork+0x30/0x50 [ 1733.299637] ? __pfx_kthread+0x10/0x10 [ 1733.299943] ret_from_fork_asm+0x1a/0x30 [ 1733.300251] </TASK> > Meanwhile, can you check if the underlying > disks has IO while raid5 stuck, by /sys/block/[device]/inflight. The two devices that are left after the 3rd one is removed has these numbers that don't change with time: [Mon Jul 22 20:18:06 @ ~]:> for d in dm-19 dm-17; do echo -n $d; cat /sys/block/$d/inflight; done dm-19 9 1 dm-17 11 2 [Mon Jul 22 20:18:11 @ ~]:> for d in dm-19 dm-17; do echo -n $d; cat /sys/block/$d/inflight; done dm-19 9 1 dm-17 11 2 They also don't change after I return the disk back (which is to be expected I guess, given that the lockup doesn't go away). > > > > > At first, can the problem reporduce with raid1/raid10? If not, > > > this > > > is > > > probably a raid5 bug. > > > > This is not reproducible with raid1 (i.e. no lockups for raid1), I > > tested that. I didn't test raid10, if you want I can try (but > > probably > > only after the weekend, because today I was asked to give the nodes > > away, for the weekend at least, to someone else). > > Yes, please try raid10 as well. For now I'll say this is a raid5 > problem. Tested: raid10 works just fine, i.e. no lockup and fio continues having non-zero IOPS. > > > The best will be that if I can reporduce this problem myself. > > > The problem is that I don't understand the step 4: turning off > > > jbod > > > slot's power, is this only possible for a real machine, or can I > > > do > > > this in my VM? > > > > Well, let's say that if it is possible, I don't know a way to do > > that. > > The `sg_ses` commands that I used > > > > sg_ses --dev-slot-num=9 --set=3:4:1 /dev/sg26 # turning > > off > > sg_ses --dev-slot-num=9 --clear=3:4:1 /dev/sg26 # turning > > on > > > > …sets and clears the value of the 3:4:1 bit, where the bit is > > defined > > by the JBOD's manufacturer datasheet. The 3:4:1 specifically is > > defined > > by "AIC" manufacturer. That means the command as is unlikely to > > work on > > a different hardware. > > I never do this before, I'll try. > > > > Well, while on it, do you have any thoughts why just using a `echo > > 1 > > > /sys/block/sdX/device/delete` doesn't reproduce it? Does perhaps > > kernel > > not emulate device disappearance too well? > > echo 1 > delete just delete the disk from kernel, and scsi/dm-raid > will > know that this disk is deleted. However, the disk will stay in kernel > for the other way, dm-raid does not aware that underlying disks are > problematic and IO will still be generated and issued. > > Thanks, > Kuai