On Sat, Nov 16, 2019 at 6:34 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > On 11/15/19 4:35 PM, Muni Sekhar wrote: > > [ Please keep me in CC as I'm not subscribed to the list] > > > > Hi All, > > > > My kernel is built with the following options: > > > > $ cat /boot/config-5.0.1 | grep NO_HZ > > CONFIG_NO_HZ_COMMON=y > > CONFIG_NO_HZ_IDLE=y > > # CONFIG_NO_HZ_FULL is not set > > CONFIG_NO_HZ=y > > CONFIG_RCU_FAST_NO_HZ=y > > > > I booted with watchdog enabled(nmi_watchdog=1) as given below: > > > > BOOT_IMAGE=/boot/vmlinuz-5.0.1 > > root=UUID=f65454ae-3f1d-4b9e-b4be-74a29becbe1e ro debug > > ignore_loglevel console=ttyUSB0,115200 console=tty0 console=tty1 > > console=ttyS2,115200 memmap=1M!1023M nmi_watchdog=1 > > crashkernel=384M-:128M > > > > When the system is frozen or the kernel is locked up(I noticed that in > > this state kernel is not responding for ALT-SysRq-<command key>) but > > watchdog is not triggered. So I want to understand how to enable the > > watchdog timer and how to verify the basic watchdog functionality > > behavior? > > > Any pointers on this will be greatly appreciated. > > > Sorry, I do not have an answer. Please note that you are talking about > the NMI watchdog, which is completely unrelated to hardware watchdogs > and not handled by the watchdog subsystem. I would suggest to send > your question to the Linux kernel mailing list and clearly state > that you are talking about the NMI watchdog. > > Please note that, for the NMI watchdog to do anything, you must have > CONFIG_HARDLOCKUP_DETECTOR enabled in your kernel configuration. I don't > know what if anything the configuration options you listed above have > to do with the NMI watchdog. Thank you for your response. I enabled hard\soft\lockup detector config options. My kernel is built with the following .config options: CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y CONFIG_HARDLOCKUP_DETECTOR_PERF=y CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y CONFIG_HARDLOCKUP_DETECTOR=y CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=1 CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=1 Also I enabled the following stuff in /proc/sys/ directory. kernel.softlockup_panic = 1 kernel.hardlockup_panic = 1 kernel.unknown_nmi_panic = 1 kernel.softlockup_all_cpu_backtrace = 1 kernel.hardlockup_all_cpu_backtrace = 1 kernel.panic = 3 kernel.panic_on_io_nmi = 1 kernel.panic_on_oops = 1 kernel.panic_on_stackoverflow = 1 kernel.panic_on_unrecovered_nmi = 1 kernel.panic_on_rcu_stall = 1 kernel.panic_print = 31 kernel.sysrq=0x1FF The https://www.kernel.org/doc/Documentation/lockup-watchdogs.txt Says “By default, the watchdog runs on all online cores. However, on a kernel configured with NO_HZ_FULL, by default the watchdog runs only on the housekeeping cores, not the cores specified in the "nohz_full" boot argument.”, so I just mentioned my kernel CONFIG_NO_HZ* options. > > Another possibility, of course, might be to enable a hardware watchdog > in your system (assuming it supports one). I personally would not trust > the NMI watchdog because to detect a system hang, after all, there are > situations where even NMIs no longer work. >From dmesg , Is it possible to know whether my system supports hardware watchdog or not? I assume that my system supports the hardware watchdog , then how to enable the hardware watchdog to debug the system freeze issues? > > Guenter -- Thanks, Sekhar