Re: [PATCH v2 2/8] watchdog: Introduce hardware maximum timeout in watchdog core

Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx> · Thu, 13 Aug 2015 23:13:45 +0200

Hello Guenter,

On Fri, Aug 07, 2015 at 10:02:41PM -0700, Guenter Roeck wrote:
> Introduce an optional hardware maximum timeout in the watchdog core.
> The hardware maximum timeout can be lower than the maximum timeout.
> 
> Drivers can set the maximum hardware timeout value in the watchdog data
> structure. If the configured timeout exceeds the maximum hardware timeout,
> the watchdog core enables a timer function to assist sending keepalive
> requests to the watchdog driver.
> 
> Cc: Timo Kokkonen <timo.kokkonen@xxxxxxxxxx>
> Cc: Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx>
> Signed-off-by: Guenter Roeck <linux@xxxxxxxxxxxx>
> ---
> v2:
> - Improved and hopefully clarified documentation.
> - Rearranged variables in struct watchdog_device such that internal variables
>   come last.
> - The code now ensures that the watchdog times out <timeout> seconds after
>   the most recent keepalive sent from user space.
> - The internal keepalive now stops silently and no longer generates a
>   warning message. Reason is that it will now stop early, while there
>   may still be a substantial amount of time for keepalives from user space
>   to arrive. If such keepalives arrive late (for example if user space
>   is configured to send keepalives just a few seconds before the watchdog
>   times out), the message would just be noise and not provide any value.
> ---
>  Documentation/watchdog/watchdog-kernel-api.txt |  23 +++-
>  drivers/watchdog/watchdog_dev.c                | 140 ++++++++++++++++++++++---
>  include/linux/watchdog.h                       |  26 +++--
>  3 files changed, 163 insertions(+), 26 deletions(-)
> 
> diff --git a/Documentation/watchdog/watchdog-kernel-api.txt b/Documentation/watchdog/watchdog-kernel-api.txt
> index d8b0d3367706..25b00b878a7b 100644
> --- a/Documentation/watchdog/watchdog-kernel-api.txt
> +++ b/Documentation/watchdog/watchdog-kernel-api.txt
> @@ -53,9 +53,12 @@ struct watchdog_device {
>  	unsigned int timeout;
>  	unsigned int min_timeout;
>  	unsigned int max_timeout;
> +	unsigned int max_hw_timeout_ms;
>  	void *driver_data;
> -	struct mutex lock;
>  	unsigned long status;
> +	struct mutex lock;
> +	unsigned long last_keepalive;
> +	struct delayed_work work;
>  	struct list_head deferred;
>  };
>  
> @@ -73,18 +76,28 @@ It contains following fields:
>    additional information about the watchdog timer itself. (Like it's unique name)
>  * ops: a pointer to the list of watchdog operations that the watchdog supports.
>  * timeout: the watchdog timer's timeout value (in seconds).
> +  This is the time after which the system will reboot if user space does
> +  not send a heartbeat request if WDOG_ACTIVE is set.
>  * min_timeout: the watchdog timer's minimum timeout value (in seconds).
>  * max_timeout: the watchdog timer's maximum timeout value (in seconds).
> +* max_hw_timeout_ms: Maximum hardware timeout, in milli-seconds. May differ
> +  from max_timeout. If set to a value larger than max_timeout, the
> +  infrastructure will send a heartbeat to the watchdog driver if 'timeout'
> +  is larger than 'max_hw_timeout / 2', unless WDOG_ACTIVE is set and user
> +  space failed to send a heartbeat for at least 'timeout' seconds.
To properly understand this it would be necessary to know what
max_timeout is. Maybe clearify that, too?

>  * bootstatus: status of the device after booting (reported with watchdog
>    WDIOF_* status bits).
>  * driver_data: a pointer to the drivers private data of a watchdog device.
>    This data should only be accessed via the watchdog_set_drvdata and
>    watchdog_get_drvdata routines.
> -* lock: Mutex for WatchDog Timer Driver Core internal use only.
>  * status: this field contains a number of status bits that give extra
>    information about the status of the device (Like: is the watchdog timer
>    running/active, is the nowayout bit set, is the device opened via
>    the /dev/watchdog interface or not, ...).
> +* lock: Mutex for WatchDog Timer Driver Core internal use only.
> +* last_keepalive: Time of most recent keepalive triggered from user space,
> +  in jiffies.
> +* work: Worker data structure for WatchDog Timer Driver Core internal use only.
>  * deferred: entry in wtd_deferred_reg_list which is used to
>    register early initialized watchdogs.
>  
> @@ -160,7 +173,11 @@ they are supported. These optional routines/operations are:
>    and -EIO for "could not write value to the watchdog". On success this
>    routine should set the timeout value of the watchdog_device to the
>    achieved timeout value (which may be different from the requested one
> -  because the watchdog does not necessarily has a 1 second resolution).
> +  because the watchdog does not necessarily have a 1 second resolution).
> +  Drivers implementing hw_max_timeout_ms set the hardware watchdog timeout
> +  to the minimum of timeout and hw_max_timeout_ms. Those drivers set the
> +  timeout value of the watchdog_device either to the requested timeout value
> +  (if it is larger than hw_max_timeout_ms), or to the achieved timeout value.
>    (Note: the WDIOF_SETTIMEOUT needs to be set in the options field of the
>    watchdog's info structure).
>  * get_timeleft: this routines returns the time that's left before a reset.
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index 06171c73daf5..c04ba1a98cc8 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -37,7 +37,9 @@
>  #include <linux/errno.h>	/* For the -ENODEV/... values */
>  #include <linux/kernel.h>	/* For printk/panic/... */
>  #include <linux/fs.h>		/* For file operations */
> +#include <linux/jiffies.h>	/* For timeout functions */
>  #include <linux/watchdog.h>	/* For watchdog specific items */
> +#include <linux/workqueue.h>	/* For workqueue */
>  #include <linux/miscdevice.h>	/* For handling misc devices */
>  #include <linux/init.h>		/* For __init/__exit/... */
>  #include <linux/uaccess.h>	/* For copy_to_user/put_user/... */
> @@ -49,6 +51,78 @@ static dev_t watchdog_devt;
>  /* the watchdog device behind /dev/watchdog */
>  static struct watchdog_device *old_wdd;
>  
> +static struct workqueue_struct *watchdog_wq;
> +
> +static inline bool watchdog_need_worker(struct watchdog_device *wdd)
> +{

Maybe add:
	/* all variables in milli seconds */

> +	unsigned int hm = wdd->max_hw_timeout_ms;
> +	unsigned int m = wdd->max_timeout * 1000;
> +	unsigned int t = wdd->timeout * 1000;
> +
> +	return watchdog_active(wdd) && hm && (!m || hm < m) && t > hm;

ok, this means:

	watchdog_active(wdd): Userspace thinks the hardware is running
	hm: the driver is aware of framework pinging
	!m: no artificial limit
	hm < m: some artificial limit (does this make sense to have
		max_timeout > DIV_ROUND_UP(max_hw_timeout_ms, 1000), or
		should we better enforce max_timeout = 0 for "good"
		drivers?)
	t > hm: userspace requests longer timeout than hardware can
		handle.

looks good.

> +}
> +
> +static unsigned long watchdog_next_keepalive(struct watchdog_device *wdd)
> +{
> +	unsigned int t = wdd->timeout * 1000;
> +	unsigned long to = wdd->last_keepalive + msecs_to_jiffies(t);
> +	unsigned long last, max_t, tj;
> +
> +	if (wdd->max_hw_timeout_ms && t > wdd->max_hw_timeout_ms)
> +		t = wdd->max_hw_timeout_ms;

watchdog_next_keepalive is only called after watchdog_need_worker
returned true, so there shouldn't be a need to repeat this test, right?

> +	tj = msecs_to_jiffies(t / 2);
> +
> +	/*
> +	 * Ensure that the watchdog times out wdd->timeout seconds
> +	 * after the most recent keepalive sent from user space.
> +	 */
> +	last = to - msecs_to_jiffies(t);
> +	if (time_is_after_jiffies(last))
> +		max_t = last - jiffies;
> +	else
> +		max_t = 0;
> +
> +	if (max_t < tj)
> +		tj = max_t;
> +
> +	return tj;
I find this hard to understand. Maybe the variable names could be
improved, something like:

	t -> hw_timeout_ms
	tj -> keep_alive_interval
	to -> virt_timeout
	last -> last_hw_ping

And I'd do:

	/*
	 * To ensure that the watchdog times out wdd->timeout seconds
	 * after the most recent ping from userspace, the last
	 * worker ping has to come in hw_timeout_ms before this timeout.
	 */
	last_hw_ping = virt_timeout - msecs_to_jiffies(hw_timeout_ms);

	/*
	 * Worker pings usually come at intervals of
	 * max_hw_timeout_ms / 2. This interval is shortend if
	 * virt_timeout is near (or already over).
	 */
	return min_t(long, last_hw_ping - jiffies, keep_alive_interval);

then you have to change the return type to long and ...

> +}
> +
> +static inline void watchdog_update_worker(struct watchdog_device *wdd,
> +					  bool cancel, bool sync)
> +{
> +	if (watchdog_need_worker(wdd)) {
> +		unsigned long t = watchdog_next_keepalive(wdd);
> +
> +		if (t)

... test for t > 0 here.

> +			mod_delayed_work(watchdog_wq, &wdd->work, t);
> +	} else if (cancel) {
> +		if (sync)

Up to now sync is always false. Does this change later? Should this
parameter be introduced only later, too?

> +			cancel_delayed_work_sync(&wdd->work);
> +		else
> +			cancel_delayed_work(&wdd->work);
> +	}
> +}

This function looks artificial, i.e. I don't understand why you would
want to modify the timer or stop it. Probably that is because the timer
does more than increasing the virtual timeout later?

> +static int _watchdog_ping(struct watchdog_device *wdd)
> +{
> +	int err;
> +
> +	if (test_bit(WDOG_UNREGISTERED, &wdd->status))
> +		return -ENODEV;
> +
> +	if (!watchdog_active(wdd))
> +		return 0;
> +
> +	if (wdd->ops->ping)
> +		err = wdd->ops->ping(wdd);  /* ping the watchdog */
> +	else
> +		err = wdd->ops->start(wdd); /* restart watchdog */
> +
> +	return err;
> +}
> +
>  /*
>   *	watchdog_ping: ping the watchdog.
>   *	@wdd: the watchdog device to ping
> @@ -61,26 +135,27 @@ static struct watchdog_device *old_wdd;
>  
>  static int watchdog_ping(struct watchdog_device *wdd)
>  {
> -	int err = 0;
> +	int err;
>  
>  	mutex_lock(&wdd->lock);
> +	err = _watchdog_ping(wdd);
> +	wdd->last_keepalive = jiffies;
I'd switch these two lines. Consider wdd->ops->ping takes some time then
the next ping should timed relative to before the ping, not after it.

> +	watchdog_update_worker(wdd, false, false);
> +	mutex_unlock(&wdd->lock);
>  
> -	if (test_bit(WDOG_UNREGISTERED, &wdd->status)) {
> -		err = -ENODEV;
> -		goto out_ping;
> -	}
> +	return err;
> +}
>  
> -	if (!watchdog_active(wdd))
> -		goto out_ping;
> +static void watchdog_ping_work(struct work_struct *work)
> +{
> +	struct watchdog_device *wdd;
>  
> -	if (wdd->ops->ping)
> -		err = wdd->ops->ping(wdd);	/* ping the watchdog */
> -	else
> -		err = wdd->ops->start(wdd);	/* restart watchdog */
> +	wdd = container_of(to_delayed_work(work), struct watchdog_device, work);
>  
> -out_ping:
> +	mutex_lock(&wdd->lock);
> +	_watchdog_ping(wdd);
> +	watchdog_update_worker(wdd, false, false);
>  	mutex_unlock(&wdd->lock);
> -	return err;
>  }
>  
>  /*
> @@ -107,8 +182,11 @@ static int watchdog_start(struct watchdog_device *wdd)
>  		goto out_start;
>  
>  	err = wdd->ops->start(wdd);
> -	if (err == 0)
> +	if (err == 0) {
>  		set_bit(WDOG_ACTIVE, &wdd->status);
> +		wdd->last_keepalive = jiffies;
> +		watchdog_update_worker(wdd, false, false);
> +	}
>  
>  out_start:
>  	mutex_unlock(&wdd->lock);
> @@ -146,8 +224,10 @@ static int watchdog_stop(struct watchdog_device *wdd)
>  	}
>  
>  	err = wdd->ops->stop(wdd);
> -	if (err == 0)
> +	if (err == 0) {
>  		clear_bit(WDOG_ACTIVE, &wdd->status);
> +		watchdog_update_worker(wdd, true, false);

Regarding my concern about watchdog_update_worker above: After
clear_bit(WDOG_ACTIVE, ...) it's an unconditional cancel_delayed_work
and I'd prefer to not have that disguised. Maybe it would be easier to
understand if there were two different functions, one with the semantic
of watchdog_update_worker(cancel=true) and one with
watchdog_update_worker(cancel=false)?

It's to late here now to review the rest of your patch. Will follow-up
after sleeping.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html