System lockup causes reboot but no panic and no kernel crash dump

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I am trying to debug a series of random system freezes / lockups by making use of the kernel lockup detector / NMI watchdog to trigger a kernel crash dump when a system lockup occurs.

We are running the 4.14.93-rt kernel on a quad-core x86_64 Intel Atom SMP machine.

The lockup detector is fully enabled and configured to trigger a panic when a hard or soft lockup occurs:
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=1
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=1

I also set the correct sysctl variables:
kernel.panic = 1
kernel.panic_on_oops = 1
kernel.unknown_nmi_panic = 1
kernel.panic_on_unrecovered_nmi = 1
kernel.panic_on_io_nmi = 1
kernel.softlockup_panic = 1
kernel.hung_task_panic = 1

I also enabled the NMI watchdog via the kernel cmdline (nmi_watchdog=1). 

I configured my system to generate a kernel crash dump using kdump / kexec when a panic occurs. When I trigger a manual kernel panic via /proc/sysrq-trigger, the crash dump mechanism works perfectly. I see a switch to my dump-capture kernel and ramdisk. 

The problem: when a real-life lockup or system freeze occurs, the system just reboots without generating a crash dump. There is no switch to the dump-capture kernel. AFAIK, there is no panic. I find nothing in the logs and nothing appears on the console.

To replicate the problem: I wrote a small program that runs an infinite nop while loop. When running this program on all 4 cores with max. real-time priority (SCHED_FIFO) to hog the CPU, I get a complete system lockup (no keyboard input, no serial console, no ping reply). This freeze then triggers a reboot (I guess when the watchdog kicks in) but no crash dump or no visible kernel panic.

I find it strange that the RT throttling mechanism does not prevent a freeze in this case (we did not disable it), but apart from that, I guess my hog application should be detected as a hung task and cause a panic. 

Any input will be greatly appreciated. 

 
Kind regards,

Tom Putzeys
    



[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux