On Mon, 2019-04-15 at 08:55 -0400, Laurence Oberman wrote: +AD4 On Sun, 2019-04-14 at 23:25 -0400, TomK wrote: +AD4 +AD4 Hey All, +AD4 +AD4 +AD4 +AD4 I'm getting a kernel panic on an Gigabyte GA-890XA-UD3 motherboard +AD4 +AD4 that +AD4 +AD4 I've got a QLE2464 card in as a target (FC). The kernel has been +AD4 +AD4 crashing / panicking in the last 1-2 months about once a +AD4 +AD4 week. Before +AD4 +AD4 that, it was rock solid for 4-5 years. I've upgraded to kernel +AD4 +AD4 4.18.19 +AD4 +AD4 but that hasn't made much of a difference. Since the message +AD4 +AD4 includes +AD4 +AD4 qla2x00+AF8-request+AF8-irqs I thought I would try here first. +AD4 +AD4 +AD4 +AD4 Tried to get more info on this but: +AD4 +AD4 +AD4 +AD4 1) Keyboard doesn't work and locks up when the panic occurs. No USB +AD4 +AD4 ports work. Tried the PS/2 port but nothing. +AD4 +AD4 +AD4 +AD4 2) Unable to capture a kdump. Can't get to the kdump vmcore due to +AD4 +AD4 1). +AD4 +AD4 +AD4 +AD4 The two screenshots is pretty much all I can capture. Tried things +AD4 +AD4 like +AD4 +AD4 clocksource+AD0-rtc in the kernel parms and disabling hpet1 but +AD4 +AD4 apparently I +AD4 +AD4 haven't disabled it everywhere since it still shows up. +AD4 +AD4 +AD4 +AD4 Wondering if anyone recognizes these messages or has any idea what +AD4 +AD4 could +AD4 +AD4 be the issue here? Even a hint would be appreciated. +AD4 +AD4 +AD4 +AD4 Hello Tom +AD4 I have had similar issues and reported them to Himanshu+AEA-Cavium +AD4 I have kept all my target servers at kernel 4.5 as it been the only +AD4 version that has always been stable. +AD4 If your motherboard has an NMI (virtual or physical) set all of these +AD4 in /etc/sysctl.conf +AD4 Run sysctl -a+ADs-dracut -f and reboot +AD4 +AD4 kernel.nmi+AF8-watchdog +AD0 1 +AD4 kernel.panic+AF8-on+AF8-io+AF8-nmi +AD0 1 +AD4 kernel.panic+AF8-on+AF8-unrecovered+AF8-nmi +AD0 +AD4 kernel.unknown+AF8-nmi+AF8-panic +AD0 1 +AD4 +AD4 When the issue shows up press the virtual/physical NMI +AD4 +AD4 This is with the assumption that generic kdump is properly setup and +AD4 dmesg +AHw grep crash shows memory resrved by the crashkernel and that you +AD4 have tested kdump manually. +AD4 +AD4 Other options are use a USB serial port to capture the full log if you +AD4 cannot get kdump to work. That approach may provide further evidence about kernel bugs but it is not guaranteed that that approach will lead to a solution. It would help if either or both of you could do the following on a test system: +ACo Check out branch qla2xxx-for-next of my kernel repo on github (https://github.com/bvanassche/linux/tree/qla2xxx-for-next). +ACo Enable lockdep and KASAN in the kernel config (CONFIG+AF8-PROVE+AF8-LOCKING and CONFIG+AF8-KASAN). +ACo Build and install that kernel. +ACo Run your favorite workload. Please note that the qla2xxx-for-next branch is based on the v5.1-rc1 kernel and hence should not be installed on any production system. Thanks, Bart.