Re: LVM kernel lockup scenario during lvcreate

Jaco Kroon <jaco@xxxxxxxxx> · Sat, 26 Aug 2023 20:18:33 +0200

Hi Laurence,

On 2023/08/25 14:01, Laurence Oberman wrote:
Hello, this would usually need an NMI sent from a management
interface
as with it locked up no guarantee a sysrq c will get there from the
keyboard.
You could try though.

As long as you have in /etc/kdump.conf

path /var/crash
core_collector makedumpfile -l --message-level 7 -d 31

This will get kernel only pages and would not be very big.

I could work with you privately to get what we need out of the
vmcore
and we would avoid transferring it.
Thanks.  This helps.  Let's get a core first (if it's going to happen
again) and then take it from there.

Kind regards,
Jaco

Hello Jaco
These hangs usually require the stacks to see where and why we are
blocked. The vmcore will definitely help in that regard.

Linux crowsnest 6.4.12-uls #1 SMP PREEMPT_DYNAMIC Fri Aug 25 02:46:44 
SAST 2023 x86_64 Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz GenuineIntel 
GNU/Linux

With the patch you referenced.

/proc/vmcore exists post kexec to the "new" kernel, if I just copy that 
do we need anything else?  Once I've copied /proc/vmcore and rebooted 
back into a more "normal" system, how do I start extracting information 
out of that core?

I don't have a kdump binary, or any other seemingly useful stuff even 
though I've got kexec-tools installed (which is where this comes from as 
far as I can tell) ... no /etc/kdump.conf either. Followed instructions 
here (with help from other sources):

https://www.kernel.org/doc/Documentation/kdump/kdump.txt

kdump references I can find w.r.t. /etc/kdump.conf seems to all be 
related to redhat and fedora ... neither of which applies (directly) to 
my Gentoo environment.

with 256G of RAM I'm assuming a crashkernel=512M should be sufficient?  
crashkernel=auto doesn't work.

The firmware upgrade on the controller killed reboot though ... BIOS no 
longer speak with the controller, but when performing the update the 
kernel immediately noticed that the firmware got upgraded.  So dead in 
the water at the moment.

Kind regards,
Jaco