Re: Lockup of (raid5 or raid6) + vdo after taking out a disk under load

Konstantin Kharlamov <Hi-Angel@xxxxxxxxx> · Mon, 22 Jul 2024 20:56:50 +0300

Hi, sorry for the delay, I had to give away the nodes and we had a week
of teambuilding and company party, so for the past week I only managed
to hack away stripping debug symbols, get another node and set it up.

Experiments below are based off of vanilla 6.9.8 kernel *without* your
patch.

On Mon, 2024-07-15 at 09:56 +0800, Yu Kuai wrote:
> Line number will be helpful.

So, after tinkering with building scripts I managed to build modules
with debug symbols (not the kernel itself but should be good enough),
but for some reason kernel doesn't show line numbers in stacktraces. No
idea what could be causing it, so I had to decode line numbers
manually, below is an output where I inserted line numbers for raid456
manually after decoding them with `gdb`.

    […]
    [ 1677.293366]  <TASK>
    [ 1677.293661]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
    [ 1677.293972]  ? _raw_spin_unlock_irq+0x10/0x30
    [ 1677.294276]  ? _raw_spin_unlock_irq+0xa/0x30
    [ 1677.294586]  raid5d at drivers/md/raid5.c:6572
    [ 1677.294910]  md_thread+0xc1/0x170
    [ 1677.295228]  ? __pfx_autoremove_wake_function+0x10/0x10
    [ 1677.295545]  ? __pfx_md_thread+0x10/0x10
    [ 1677.295870]  kthread+0xff/0x130
    [ 1677.296189]  ? __pfx_kthread+0x10/0x10
    [ 1677.296498]  ret_from_fork+0x30/0x50
    [ 1677.296810]  ? __pfx_kthread+0x10/0x10
    [ 1677.297112]  ret_from_fork_asm+0x1a/0x30
    [ 1677.297424]  </TASK>
    […]
    [ 1705.296253]  <TASK>
    [ 1705.296554]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
    [ 1705.296864]  ? _raw_spin_unlock_irq+0x10/0x30
    [ 1705.297172]  ? _raw_spin_unlock_irq+0xa/0x30
    [ 1677.294586]  raid5d at drivers/md/raid5.c:6597
    [ 1705.297794]  md_thread+0xc1/0x170
    [ 1705.298099]  ? __pfx_autoremove_wake_function+0x10/0x10
    [ 1705.298409]  ? __pfx_md_thread+0x10/0x10
    [ 1705.298714]  kthread+0xff/0x130
    [ 1705.299022]  ? __pfx_kthread+0x10/0x10
    [ 1705.299333]  ret_from_fork+0x30/0x50
    [ 1705.299641]  ? __pfx_kthread+0x10/0x10
    [ 1705.299947]  ret_from_fork_asm+0x1a/0x30
    [ 1705.300257]  </TASK>
    […]
    [ 1733.296255]  <TASK>
    [ 1733.296556]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
    [ 1733.296862]  ? _raw_spin_unlock_irq+0x10/0x30
    [ 1733.297170]  ? _raw_spin_unlock_irq+0xa/0x30
    [ 1677.294586]  raid5d at drivers/md/raid5.c:6572
    [ 1733.297792]  md_thread+0xc1/0x170
    [ 1733.298096]  ? __pfx_autoremove_wake_function+0x10/0x10
    [ 1733.298403]  ? __pfx_md_thread+0x10/0x10
    [ 1733.298711]  kthread+0xff/0x130
    [ 1733.299018]  ? __pfx_kthread+0x10/0x10
    [ 1733.299330]  ret_from_fork+0x30/0x50
    [ 1733.299637]  ? __pfx_kthread+0x10/0x10
    [ 1733.299943]  ret_from_fork_asm+0x1a/0x30
    [ 1733.300251]  </TASK>

> Meanwhile, can you check if the underlying
> disks has IO while raid5 stuck, by /sys/block/[device]/inflight.

The two devices that are left after the 3rd one is removed has these
numbers that don't change with time:

    [Mon Jul 22 20:18:06 @ ~]:> for d in dm-19 dm-17; do echo -n $d; cat
    /sys/block/$d/inflight; done
    dm-19       9        1
    dm-17      11        2
    [Mon Jul 22 20:18:11 @ ~]:> for d in dm-19 dm-17; do echo -n $d; cat
    /sys/block/$d/inflight; done
    dm-19       9        1
    dm-17      11        2

They also don't change after I return the disk back (which is to be
expected I guess, given that the lockup doesn't go away).

> >
> > > At first, can the problem reporduce with raid1/raid10? If not,
> > > this
> > > is
> > > probably a raid5 bug.
> >
> > This is not reproducible with raid1 (i.e. no lockups for raid1), I
> > tested that. I didn't test raid10, if you want I can try (but
> > probably
> > only after the weekend, because today I was asked to give the nodes
> > away, for the weekend at least, to someone else).
>
> Yes, please try raid10 as well. For now I'll say this is a raid5
> problem.

Tested: raid10 works just fine, i.e. no lockup and fio continues
having non-zero IOPS.

> > > The best will be that if I can reporduce this problem myself.
> > > The problem is that I don't understand the step 4: turning off
> > > jbod
> > > slot's power, is this only possible for a real machine, or can I
> > > do
> > > this in my VM?
> >
> > Well, let's say that if it is possible, I don't know a way to do
> > that.
> > The `sg_ses` commands that I used
> >
> > 	sg_ses --dev-slot-num=9 --set=3:4:1   /dev/sg26 # turning
> > off
> > 	sg_ses --dev-slot-num=9 --clear=3:4:1 /dev/sg26 # turning
> > on
> >
> > …sets and clears the value of the 3:4:1 bit, where the bit is
> > defined
> > by the JBOD's manufacturer datasheet. The 3:4:1 specifically is
> > defined
> > by "AIC" manufacturer. That means the command as is unlikely to
> > work on
> > a different hardware.
>
> I never do this before, I'll try.
> >
> > Well, while on it, do you have any thoughts why just using a `echo
> > 1 >
> > /sys/block/sdX/device/delete` doesn't reproduce it? Does perhaps
> > kernel
> > not emulate device disappearance too well?
>
> echo 1 > delete just delete the disk from kernel, and scsi/dm-raid
> will
> know that this disk is deleted. However, the disk will stay in kernel
> for the other way, dm-raid does not aware that underlying disks are
> problematic and IO will still be generated and issued.
>
> Thanks,
> Kuai