Re: watchdog: how to enable?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/18/19 1:52 AM, Muni Sekhar wrote:
On Sun, Nov 17, 2019 at 3:12 AM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:

On 11/16/19 10:34 AM, Muni Sekhar wrote:
On Sat, Nov 16, 2019 at 9:31 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:

On 11/15/19 7:03 PM, Muni Sekhar wrote:
[ ... ]

Another possibility, of course, might be to enable a hardware watchdog
in your system (assuming it supports one). I personally would not trust
the NMI watchdog because to detect a system hang, after all, there are
situations where even NMIs no longer work.

>From dmesg , Is it possible to know whether my system supports
hardware watchdog or not?
I assume that my system supports the hardware watchdog , then how to
enable the hardware watchdog to debug the system freeze issues?


Hardware watchdog support really depends on the board type. Most PC
mainboards support a watchdog in the Super-IO chip, but on some it is
not wired correctly. On embedded boards it is often built into the SoC.
The easiest way to see if you have a watchdog would be to check for the
existence of /dev/watchdog. However, on a PC that would most likely
not be there because the necessary module is not auto-loaded.
If you tell us your board type, or better the Super-IO chip on the board,
we might be able to help.

I’m having two same configuration systems, in one system I installed
the Vanilla kernel and I see the /dev/watchdog and /dev/watchdog0
nodes. In other system I’m running with ubuntu distribution kernel,
but I don’t see any watchdog device node. So it looks like I need to
manually load the kernel module in distro kernel. Is there a way to
know what is the corresponding kernel module for  /dev/watchdog node?

# ls -l /dev/watchdog*
crw------- 1 root root  10, 130 Nov 15 17:15 /dev/watchdog
crw------- 1 root root 248,   0 Nov 15 17:15 /dev/watchdog0

# ps -ax | grep watchdog
    678 ?        S      0:00 [watchdogd]

Regarding Super-IO chip, how to find out the Super-IO chip model?

You could try to run sensors-detect (from the "sensors" package).

If you can boot a system with /dev/watchdog0, you should see the type
in /sys/class/watchdog/watchdog0/identity.
I could not find the /sys/class/watchdog/watchdog0/identity and
/sys/class/watchdog/watchdog0/timeout files.
$ ls -l /sys/class/watchdog/watchdog0/
total 0
-r--r--r-- 1 root root 4096 Nov 18 15:12 dev
lrwxrwxrwx 1 root root    0 Nov 18 15:12 device -> ../../../iTCO_wdt.0.auto
drwxr-xr-x 2 root root    0 Nov 18 15:12 power
lrwxrwxrwx 1 root root    0 Nov 18 14:53 subsystem ->
../../../../../../class/watchdog
-rw-r--r-- 1 root root 4096 Nov 18 14:53 uevent


Presumably CONFIG_WATCHDOG_SYSFS is not enabled in your configuration.


Also, you can test if the watchdog works with "sudo cat /dev/watchdog",
assuming the watchdog daemon is not running. The watchdog works if the
system reboots after the watchdog times out (/sys/class/watchdog/watchdog0/timeout
is the timeout in seconds).
sudo cat /dev/watchdog perfectly rebooted my system. I don't see
timeout node, how do I configure the timeout value?

sudo apt-get install watchdog
man watchdog

should tell you. Alternatively, enable CONFIG_WATCHDOG_SYSFS.



Note though that this won't help to debug the problem. A hardware
watchdog resets the system. It helps to recover, but it is not intended
to help with debugging.
How do I use the hardware watchdog to reset my system when system is
frozen? It helps me to collect the crashdump and finally helps me to
find the root cause for the system frozen issue.

There won't be a crashdump. It just hard-resets the system.
So is there any other solution to capture the crashdump or trigger
soft reboot once kernel is lockedup?

Not that I know of. I suspect, though, that you either have a hard lockup
where even NMI is non-operational, or NMI doesn't work in your system
to start with.

If you have nmi_watchdog=1 in your kernel command line, /proc/interrupts
should show a non-zero number of NMI interrupts. Do you see that in your system ?

Guenter



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux