Hi Laurence,
On 2023/08/25 14:01, Laurence Oberman wrote:
Hello, this would usually need an NMI sent from a management
interface
as with it locked up no guarantee a sysrq c will get there from the
keyboard.
You could try though.
As long as you have in /etc/kdump.conf
path /var/crash
core_collector makedumpfile -l --message-level 7 -d 31
This will get kernel only pages and would not be very big.
I could work with you privately to get what we need out of the
vmcore
and we would avoid transferring it.
Thanks. This helps. Let's get a core first (if it's going to happen
again) and then take it from there.
Kind regards,
Jaco
Hello Jaco
These hangs usually require the stacks to see where and why we are
blocked. The vmcore will definitely help in that regard.
Linux crowsnest 6.4.12-uls #1 SMP PREEMPT_DYNAMIC Fri Aug 25 02:46:44
SAST 2023 x86_64 Intel(R) Xeon(R) CPU E5-2603 v3 @ 1.60GHz GenuineIntel
GNU/Linux
With the patch you referenced.
/proc/vmcore exists post kexec to the "new" kernel, if I just copy that
do we need anything else? Once I've copied /proc/vmcore and rebooted
back into a more "normal" system, how do I start extracting information
out of that core?
I don't have a kdump binary, or any other seemingly useful stuff even
though I've got kexec-tools installed (which is where this comes from as
far as I can tell) ... no /etc/kdump.conf either. Followed instructions
here (with help from other sources):
https://www.kernel.org/doc/Documentation/kdump/kdump.txt
kdump references I can find w.r.t. /etc/kdump.conf seems to all be
related to redhat and fedora ... neither of which applies (directly) to
my Gentoo environment.
with 256G of RAM I'm assuming a crashkernel=512M should be sufficient?
crashkernel=auto doesn't work.
The firmware upgrade on the controller killed reboot though ... BIOS no
longer speak with the controller, but when performing the update the
kernel immediately noticed that the firmware got upgraded. So dead in
the water at the moment.
Kind regards,
Jaco