On 22/01/2019 18.29, Guenter Roeck wrote: > On Mon, Jan 21, 2019 at 08:45:39PM +0000, Rasmus Villemoes wrote: >> The watchdog framework takes care of feeding a hardware watchdog until >> userspace opens /dev/watchdogN. If that never happens for some reason >> (buggy init script, corrupt root filesystem or whatnot) but the kernel >> itself is fine, the machine stays up indefinitely. This patch allows >> setting an upper limit for how long the kernel will take care of the >> watchdog, thus ensuring that the watchdog will eventually reset the >> machine. >> >> A value of 0 (the default) means infinite timeout, preserving the >> current behaviour. >> >> This is particularly useful for embedded devices where some fallback >> logic is implemented in the bootloader (e.g., use a different root >> partition, boot from network, ...). >> >> There is already handle_boot_enabled serving a similar purpose. However, >> such a binary choice is unsuitable if the hardware watchdog cannot be >> programmed by the bootloader to provide a timeout long enough for >> userspace to get up and running. Many of the embedded devices we see use >> external (gpio-triggered) watchdogs with a fixed timeout of the order of >> 1-2 seconds. >> >> The open timeout is also used as a maximum time for an application to >> re-open /dev/watchdogN after closing it. Again, while the kernel already >> has a nowayout mechanism, using that means userspace is at the mercy of >> whatever timeout the hardware has. >> >> Being a module parameter, one can revert to the ordinary behaviour of >> having the kernel maintain the watchdog indefinitely by simply writing 0 >> to /sys/... after initially opening /dev/watchdog; conversely, one can >> of course also have the current behaviour of allowing indefinite time >> until the first open, and then set that module parameter. >> >> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@xxxxxxxxx> >> --- >> .../watchdog/watchdog-parameters.txt | 8 +++++ >> drivers/watchdog/watchdog_dev.c | 30 +++++++++++++++++-- >> 2 files changed, 36 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt >> index 0b88e333f9e1..907c4bb13810 100644 >> --- a/Documentation/watchdog/watchdog-parameters.txt >> +++ b/Documentation/watchdog/watchdog-parameters.txt >> @@ -8,6 +8,14 @@ See Documentation/admin-guide/kernel-parameters.rst for information on >> providing kernel parameters for builtin drivers versus loadable >> modules. >> >> +The watchdog core parameter watchdog.open_timeout is the maximum time, >> +in seconds, for which the watchdog framework will take care of pinging >> +a hardware watchdog until userspace opens the corresponding >> +/dev/watchdogN device. A value of 0 (the default) means an infinite >> +timeout. Setting this to a non-zero value can be useful to ensure that >> +either userspace comes up properly, or the board gets reset and allows >> +fallback logic in the bootloader to try something else. >> + > > This is misleading. Unless I am missing something, the above only applies > if the watchdog is already runnning at boot, well, yes, if it's not running at boot, there's nothing for the kernel to take care of. I can do s/a hardware watchdog/a running hardware watchdog/ if that makes it clearer. and after it has been opened > and closed once. > > FWIW, I find this operation quite confusing. What is the rationale for not > starting the watchdog at boot time if it is not running and open_timeout > is set, but then refusing to stop it after it has been started once ? I guess you have a point that there's a certain asymmetry there. In practice, the cases where one wants this kind of robustness guarantee do not rely on a watchdog that can/must be started and stopped by software. >> >> static void watchdog_ping_work(struct kthread_work *work) >> @@ -297,7 +317,7 @@ static int watchdog_stop(struct watchdog_device *wdd) >> return -EBUSY; >> } >> >> - if (wdd->ops->stop) { >> + if (wdd->ops->stop && !open_timeout) { > > This changes the semantics of WDIOC_SETOPTIONS / WDIOS_DISABLECARD. > "Turn off the watchdog timer" is well defined and doesn't leave > the option of setting a timeout on it. I can drop this hunk, since it's mostly irrelevant to the actual use cases I have in mind. It makes testing the feature on reference boards a little more awkward, but I can live with that. >> + >> +module_param(open_timeout, uint, 0644); >> +MODULE_PARM_DESC(open_timeout, >> + "Maximum time (in seconds, 0 means infinity) for userspace to take over a running watchdog (default=0)"); > > The description is misleading. After the initial open, a subsequent close no > longer really stops the watchdog. If I drop the "&& !open_timeout" above, this would be accurate, no? Rasmus