On Sun, Nov 17, 2019 at 3:12 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > On 11/16/19 10:34 AM, Muni Sekhar wrote: > > On Sat, Nov 16, 2019 at 9:31 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > >> > >> On 11/15/19 7:03 PM, Muni Sekhar wrote: > >> [ ... ] > >>>> > >>>> Another possibility, of course, might be to enable a hardware watchdog > >>>> in your system (assuming it supports one). I personally would not trust > >>>> the NMI watchdog because to detect a system hang, after all, there are > >>>> situations where even NMIs no longer work. > >>> > >>> >From dmesg , Is it possible to know whether my system supports > >>> hardware watchdog or not? > >>> I assume that my system supports the hardware watchdog , then how to > >>> enable the hardware watchdog to debug the system freeze issues? > >>> > >> > >> Hardware watchdog support really depends on the board type. Most PC > >> mainboards support a watchdog in the Super-IO chip, but on some it is > >> not wired correctly. On embedded boards it is often built into the SoC. > >> The easiest way to see if you have a watchdog would be to check for the > >> existence of /dev/watchdog. However, on a PC that would most likely > >> not be there because the necessary module is not auto-loaded. > >> If you tell us your board type, or better the Super-IO chip on the board, > >> we might be able to help. > > > > I’m having two same configuration systems, in one system I installed > > the Vanilla kernel and I see the /dev/watchdog and /dev/watchdog0 > > nodes. In other system I’m running with ubuntu distribution kernel, > > but I don’t see any watchdog device node. So it looks like I need to > > manually load the kernel module in distro kernel. Is there a way to > > know what is the corresponding kernel module for /dev/watchdog node? > > > > # ls -l /dev/watchdog* > > crw------- 1 root root 10, 130 Nov 15 17:15 /dev/watchdog > > crw------- 1 root root 248, 0 Nov 15 17:15 /dev/watchdog0 > > > > # ps -ax | grep watchdog > > 678 ? S 0:00 [watchdogd] > > > > Regarding Super-IO chip, how to find out the Super-IO chip model? > > > You could try to run sensors-detect (from the "sensors" package). > > If you can boot a system with /dev/watchdog0, you should see the type > in /sys/class/watchdog/watchdog0/identity. I could not find the /sys/class/watchdog/watchdog0/identity and /sys/class/watchdog/watchdog0/timeout files. $ ls -l /sys/class/watchdog/watchdog0/ total 0 -r--r--r-- 1 root root 4096 Nov 18 15:12 dev lrwxrwxrwx 1 root root 0 Nov 18 15:12 device -> ../../../iTCO_wdt.0.auto drwxr-xr-x 2 root root 0 Nov 18 15:12 power lrwxrwxrwx 1 root root 0 Nov 18 14:53 subsystem -> ../../../../../../class/watchdog -rw-r--r-- 1 root root 4096 Nov 18 14:53 uevent > > Also, you can test if the watchdog works with "sudo cat /dev/watchdog", > assuming the watchdog daemon is not running. The watchdog works if the > system reboots after the watchdog times out (/sys/class/watchdog/watchdog0/timeout > is the timeout in seconds). sudo cat /dev/watchdog perfectly rebooted my system. I don't see timeout node, how do I configure the timeout value? > > >> > >> Note though that this won't help to debug the problem. A hardware > >> watchdog resets the system. It helps to recover, but it is not intended > >> to help with debugging. > > How do I use the hardware watchdog to reset my system when system is > > frozen? It helps me to collect the crashdump and finally helps me to > > find the root cause for the system frozen issue. > > > There won't be a crashdump. It just hard-resets the system. So is there any other solution to capture the crashdump or trigger soft reboot once kernel is lockedup? > > Guenter -- Thanks, Sekhar