On 4/15/2019 3:35 PM, Laurence Oberman wrote:
On Mon, 2019-04-15 at 08:39 -0700, Bart Van Assche wrote:
On Mon, 2019-04-15 at 08:55 -0400, Laurence Oberman wrote:
On Sun, 2019-04-14 at 23:25 -0400, TomK wrote:
Hey All,
I'm getting a kernel panic on an Gigabyte GA-890XA-UD3
motherboard
that
I've got a QLE2464 card in as a target (FC). The kernel has
been
crashing / panicking in the last 1-2 months about once a
week. Before
that, it was rock solid for 4-5 years. I've upgraded to kernel
4.18.19
but that hasn't made much of a difference. Since the message
includes
qla2x00_request_irqs I thought I would try here first.
Tried to get more info on this but:
1) Keyboard doesn't work and locks up when the panic occurs. No
USB
ports work. Tried the PS/2 port but nothing.
2) Unable to capture a kdump. Can't get to the kdump vmcore due
to
1).
The two screenshots is pretty much all I can capture. Tried
things
like
clocksource=rtc in the kernel parms and disabling hpet1 but
apparently I
haven't disabled it everywhere since it still shows up.
Wondering if anyone recognizes these messages or has any idea
what
could
be the issue here? Even a hint would be appreciated.
Hello Tom
I have had similar issues and reported them to Himanshu@Cavium
I have kept all my target servers at kernel 4.5 as it been the only
version that has always been stable.
If your motherboard has an NMI (virtual or physical) set all of
these
in /etc/sysctl.conf
Run sysctl -a;dracut -f and reboot
kernel.nmi_watchdog = 1
kernel.panic_on_io_nmi = 1
kernel.panic_on_unrecovered_nmi =
kernel.unknown_nmi_panic = 1
When the issue shows up press the virtual/physical NMI
This is with the assumption that generic kdump is properly setup
and
dmesg | grep crash shows memory resrved by the crashkernel and that
you
have tested kdump manually.
Other options are use a USB serial port to capture the full log if
you
cannot get kdump to work.
That approach may provide further evidence about kernel bugs but it
is not
guaranteed that that approach will lead to a solution. It would help
if
either or both of you could do the following on a test system:
* Check out branch qla2xxx-for-next of my kernel repo on github
(https://github.com/bvanassche/linux/tree/qla2xxx-for-next).
* Enable lockdep and KASAN in the kernel config (CONFIG_PROVE_LOCKING
and
CONFIG_KASAN).
* Build and install that kernel.
* Run your favorite workload.
Please note that the qla2xxx-for-next branch is based on the v5.1-rc1
kernel
and hence should not be installed on any production system.
Thanks,
Bart.
Hello Bart
OK, I will get to this by Thursday, wont be able to change the
targetserver kernel until then.
Regards
Laurence
Same. I'll try this out closer to the weekend.
Not an NMI motherboard. This is a 9-10 year old AMD board meant as a
desktop or home server.
I'll have to read more about the USB Serial port to capture further
info. That's interesting.
For the time being, I've disabled HPET in BIOS. ( Appears the kernel
boot parameter method wasn't enough. )
--
Thx,
TK.