Re: LVM kernel lockup scenario during lvcreate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 2023/08/24 19:29, Laurence Oberman wrote:

On Mon, 2023-06-12 at 11:40 -0700, Bart Van Assche wrote:
On 6/9/23 00:29, Jaco Kroon wrote:
I'm attaching dmesg -T and ps axf.  dmesg in particular may provide
clues as it provides a number of stack traces indicating stalling
at
IO time.

Once this has triggered, even commands such as "lvs" goes into
uninterruptable wait, I unfortunately didn't test "dmsetup ls" now
and triggered a reboot already (system needs to be up).
To me the call traces suggest that an I/O request got stuck.
Unfortunately call traces are not sufficient to identify the root
cause
in case I/O gets stuck. Has debugfs been mounted? If so, how about
dumping the contents of /sys/kernel/debug/block/ into a tar file
after
the lockup has been reproduced and sharing that information?

tar -czf- -C /sys/kernel/debug/block . >block.tgz

Thanks,

Bart.

One I am aware of is this
commit 106397376c0369fcc01c58dd189ff925a2724a57
Author: David Jeffery <djeffery@xxxxxxxxxx>

Can we try get a vmcore (assuming its not a secure site)

Certainly.  Obviously on any host handling any kind of sensitive data there is a likelihood that sensitive data may be present in the vmcore, as such I more than happy to create a vmcore, I'm assuming this will create a kernel version of a core dump ... with 256GB of RAM (most of which goes towards disk caches) I'm further assuming this file can be potentially large.  Where will this get stored should the capture be made?  (I need to ensure that the filesystem has sufficient storage available)


Add these to /etc/sysctl.conf

kernel.panic_on_io_nmi = 1
kernel.panic_on_unrecovered_nmi = 1
kernel.unknown_nmi_panic = 1

Run sysctl -p
Ensure kdump is running and can capture a vmcore
Done.  Had to enable a few extra kernel options to get all the other requirements, so scheduled a reboot to activate the new kernel. This will happen on Saturday morning very early.

When it locks up again
send an NMI via the SuperMicro Web Managemnt interface

Possible to send from sysrq at the keyboard?  Otherwise I'll just need to set up the RMI, will just be easier to do this from the keyboard if possible, it's not always if it's left too late.


Share the vmcore, or we can have you capture some specifics from it to
triage.

I'd prefer you let me know what you need ... security concerns and all ... frankly, I highly doubt there is any data that is really so sensitive that it can be classified as "top secret" but we do have NDAs in place prohibiting me from sharing anything that may potentially contain customer related data ...

Kind regards,
Jaco



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux