Re: Query: Best way to know if a watchdog is active (kicked)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Pratyush

A comment from my understanding about the background..

On 08/18/15 at 12:27pm, Pratyush Anand wrote:
> Hi Guenter,
> 
> Thanks a lot for your quick reply.
> 
> On 17/08/2015:10:39:48 PM, Guenter Roeck wrote:
> > On 08/17/2015 10:15 PM, Pratyush Anand wrote:
> > >Hi,
> > >
> > >I am looking for the best way to know if a watchdog has been kicked and active.
> > >
> > >I can see a way is to read timeout(WDIOC_GETTIMEOUT) and  timeleft(
> > >WDIOC_GETTIMELEFT). If they do not match, it means that wdt is active.
> > >
> > >But what if we tried to read timeleft just in time when watchdog daemon/or some
> > >other application had kicked it. May be we read timeleft twice at the interval
> > >of 1 sec.
> > >
> > >Please let me know if there is any other alternative which could be a better way
> > >to know if watchdog is active?  Or may be it would be good to implement an ioctl
> > >WDIOC_ACTIVE?
> > >
> > 
> > Normally the watchdog is active if the watchdog device is open, unless the
> > application controlling it explicitly disabled it with WDIOC_SETOPTIONS.
> > Therefore, the controlling application should always know the status.
> > A different application can not open the watchdog device, so it won't be
> > able to get its status using an ioctl anyway.
> 
> Yes, A different application can not open in parallel, but can open once the
> previous application has closed it. For example this is what I see:
> 
> --------------------------------------------------------------
> # cat /dev/watchdog1 ; sleep 5; wdctl /dev/watchdog1
> cat: /dev/watchdog1: Invalid argument
> wdctl: write failed: Invalid argument
> Device:        /dev/watchdog1
> Identity:      iTCO_wdt [version 0]
> Timeout:       30 seconds
> Timeleft:      24 seconds
> FLAG           DESCRIPTION               STATUS BOOT-STATUS
> KEEPALIVEPING  Keep alive ping reply          0           0
> MAGICCLOSE     Supports magic close char      0           0
> SETTIMEOUT     Set timeout (in seconds)       0           0
> --------------------------------------------------------------
> So, cat opened it and kicked it as well. But, it could not stop it as magic
> character "V" had not not received. Therefore, when wdctl opened and read
> Timeleft, it was different than Timeout.
> 
> > 
> > Why is that insufficient ?
> 
> Well, let me explain the use case. Consider the situation when:
> -- A system has activated its watchdog to take care of software hang. So, when
> software has hanged, wdt causes to reboot, else it is kicked again before
> timeout.
> -- The same system has also activated kdump(kdump is a method to reboot to a
> minimal stable secondary kernel in case of kernel crash). Now when wdt was still
> active, there was a kernel crash and system booted to a secondary stable kernel
> which copies crash related data to a safe location. Since, wdt was active so
> before the desired process could complete in secondary kernel, hardware rebooted.
> -- So, the watchdog device need to be stoped in secondary kernel as early as

Either stop it or continue kicking before timeout are fine.

> possible. Loading of driver/module itself stops a kicked device. So, if there
> could be a way to know active wdt from kernel, then the two daemon (one which
> manages watchdog and other which manages kdump) can play independently, and
> kdump daemon can correctly program a kdump file system to load relevant watchdog
> module as early as possible.

Some drivers like iTCO_wdt can stop it during module loading. But I'm not sure all
drivers work. At least under 'nowayout' mode.

So the better way (still is a best effort solution though) should be kicking it again
before timeout.

> -- Current distro implementations loads all the watchdog devices driver module
> in secondary kernel, which is not nice (secondary kdump kernel should be as
> minimal as possible).
> 
> ~Pratyush

Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux