On Mon, 2019-04-15 at 08:39 -0700, Bart Van Assche wrote: > On Mon, 2019-04-15 at 08:55 -0400, Laurence Oberman wrote: > > On Sun, 2019-04-14 at 23:25 -0400, TomK wrote: > > > Hey All, > > > > > > I'm getting a kernel panic on an Gigabyte GA-890XA-UD3 > > > motherboard > > > that > > > I've got a QLE2464 card in as a target (FC). The kernel has > > > been > > > crashing / panicking in the last 1-2 months about once a > > > week. Before > > > that, it was rock solid for 4-5 years. I've upgraded to kernel > > > 4.18.19 > > > but that hasn't made much of a difference. Since the message > > > includes > > > qla2x00_request_irqs I thought I would try here first. > > > > > > Tried to get more info on this but: > > > > > > 1) Keyboard doesn't work and locks up when the panic occurs. No > > > USB > > > ports work. Tried the PS/2 port but nothing. > > > > > > 2) Unable to capture a kdump. Can't get to the kdump vmcore due > > > to > > > 1). > > > > > > The two screenshots is pretty much all I can capture. Tried > > > things > > > like > > > clocksource=rtc in the kernel parms and disabling hpet1 but > > > apparently I > > > haven't disabled it everywhere since it still shows up. > > > > > > Wondering if anyone recognizes these messages or has any idea > > > what > > > could > > > be the issue here? Even a hint would be appreciated. > > > > > > > Hello Tom > > I have had similar issues and reported them to Himanshu@Cavium > > I have kept all my target servers at kernel 4.5 as it been the only > > version that has always been stable. > > If your motherboard has an NMI (virtual or physical) set all of > > these > > in /etc/sysctl.conf > > Run sysctl -a;dracut -f and reboot > > > > kernel.nmi_watchdog = 1 > > kernel.panic_on_io_nmi = 1 > > kernel.panic_on_unrecovered_nmi = > > kernel.unknown_nmi_panic = 1 > > > > When the issue shows up press the virtual/physical NMI > > > > This is with the assumption that generic kdump is properly setup > > and > > dmesg | grep crash shows memory resrved by the crashkernel and that > > you > > have tested kdump manually. > > > > Other options are use a USB serial port to capture the full log if > > you > > cannot get kdump to work. > > That approach may provide further evidence about kernel bugs but it > is not > guaranteed that that approach will lead to a solution. It would help > if > either or both of you could do the following on a test system: > * Check out branch qla2xxx-for-next of my kernel repo on github > (https://github.com/bvanassche/linux/tree/qla2xxx-for-next). > * Enable lockdep and KASAN in the kernel config (CONFIG_PROVE_LOCKING > and > CONFIG_KASAN). > * Build and install that kernel. > * Run your favorite workload. > > Please note that the qla2xxx-for-next branch is based on the v5.1-rc1 > kernel > and hence should not be installed on any production system. > > Thanks, > > Bart. Hello Bart OK, I will get to this by Thursday, wont be able to change the targetserver kernel until then. Regards Laurence