RE: [PATCH v3 2/3] watchdog: mlx-wdt: introduce a watchdog driver for Mellanox systems.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Guenter Roeck [mailto:groeck7@xxxxxxxxx] On Behalf Of Guenter
> Roeck
> Sent: Tuesday, February 19, 2019 7:59 PM
> To: Michael Shych <michaelsh@xxxxxxxxxxxx>; wim@xxxxxxxxxxxxxxxxxx;
> andy@xxxxxxxxxxxxx; dvhart@xxxxxxxxxxxxx
> Cc: linux-watchdog@xxxxxxxxxxxxxxx; platform-driver-x86@xxxxxxxxxxxxxxx;
> Vadim Pasternak <vadimp@xxxxxxxxxxxx>
> Subject: Re: [PATCH v3 2/3] watchdog: mlx-wdt: introduce a watchdog driver
> for Mellanox systems.
> 
> On 2/19/19 9:02 AM, michaelsh@xxxxxxxxxxxx wrote:
> > From: Michael Shych <michaelsh@xxxxxxxxxxxx>
> >
> > Introduce watchdog driver for a various range of Mellanox Ethernet and
> > Infiniband switch systems.
> >
> > Watchdog driver for Mellanox watchdog devices, implemented in
> > programmable logic device.
> >
> > Main and auxiliary watchdog devices can exist on the same system.
> > There are several actions that can be defined in the watchdog:
> > system reset, start fans on full speed and increase a counter.
> > The last 2 actions are performed without a system reset.
> > Actions without reset are provided for auxiliary watchdog devices,
> > which is optional.
> > Access to HW registers is performed through generic
> > regmap interface.
> >
> > There are 2 types of HW watchdog implementations.
> > Type 1: actual HW timeout can be defined as power of 2 msec.
> > e.g. timeout 20 sec will be rounded up to 32768 msec.;
> > maximum timeout period is 32 sec (32768 msec.);
> > get time-left isn't supported
> > Type 2: actual HW timeout is defined in sec. and it's the same as
> > user-defined timeout;
> > maximum timeout is 255 sec;
> > get time-left is supported;
> >
> > Watchdog driver is probed from the common mlx_platform driver.
> >
> > Signed-off-by: Michael Shych <michaelsh@xxxxxxxxxxxx>
> >
> > ---
> > v1->v2
> > Comments pointed out by Guenter:
> > 1. Remove unneeded lock in access functions.
> > 2. Change return  int to void in functions with constant 0 return value
> >     or with unchecked returned code.
> > 3. Use regmap_update_bits() or regmap_update_bits_base() functions.
> > 4. Remove unneeded dev_info messages.
> > 5. Change set_timeout function.
> > 6. Changes in probe
> > 7. Delete unneeded remove function.
> > ---
> > v2->v3
> > Comments pointed out by Guenter:
> > 1. Remove unneedeed define.
> > 2. Remove structure field hw_timeout. Use local var instead it.
> > 3. Remove unnecesarry label.
> > 4. Move enum mlxreg_wdt_type to common mlxreh.h file.
> > 5. Watchdog HW type will be provided by parent platform driver,
> >     no need to find this in mlx-wdt driver.
> > ---
> >   drivers/watchdog/Kconfig   |  16 +++
> >   drivers/watchdog/Makefile  |   1 +
> >   drivers/watchdog/mlx_wdt.c | 290
> +++++++++++++++++++++++++++++++++++++++++++++
> >   3 files changed, 307 insertions(+)
> >   create mode 100644 drivers/watchdog/mlx_wdt.c
> >
> > diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
> > index 57f017d74a97..f1766eb081bb 100644
> > --- a/drivers/watchdog/Kconfig
> > +++ b/drivers/watchdog/Kconfig
> > @@ -241,6 +241,22 @@ config RAVE_SP_WATCHDOG
> >   	help
> >   	  Support for the watchdog on RAVE SP device.
> >
> > +config MLX_WDT
> > +	tristate "Mellanox Watchdog"
> > +	depends on MELLANOX_PLATFORM
> > +	select WATCHDOG_CORE
> > +	select REGMAP
> > +	help
> > +	  This is the driver for the hardware watchdog on Mellanox systems.
> > +	  If you are going to use it, say Y here, otherwise N.
> > +	  This driver can be used together with the watchdog daemon.
> > +	  It can also watch your kernel to make sure it doesn't freeze,
> > +	  and if it does, it reboots your system after a certain amount of
> > +	  time.
> > +
> > +	  To compile this driver as a module, choose M here: the
> > +	  module will be called mlx-wdt.
> > +
> >   # ALPHA Architecture
> >
> >   # ARM Architecture
> > diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
> > index a0917ef28e07..941b74185c9c 100644
> > --- a/drivers/watchdog/Makefile
> > +++ b/drivers/watchdog/Makefile
> > @@ -142,6 +142,7 @@ obj-$(CONFIG_INTEL_MID_WATCHDOG) += intel-
> mid_wdt.o
> >   obj-$(CONFIG_INTEL_MEI_WDT) += mei_wdt.o
> >   obj-$(CONFIG_NI903X_WDT) += ni903x_wdt.o
> >   obj-$(CONFIG_NIC7018_WDT) += nic7018_wdt.o
> > +obj-$(CONFIG_MLX_WDT) += mlx_wdt.o
> >
> >   # M68K Architecture
> >   obj-$(CONFIG_M54xx_WATCHDOG) += m54xx_wdt.o
> > diff --git a/drivers/watchdog/mlx_wdt.c b/drivers/watchdog/mlx_wdt.c
> > new file mode 100644
> > index 000000000000..0717a48d2b4a
> > --- /dev/null
> > +++ b/drivers/watchdog/mlx_wdt.c
> > @@ -0,0 +1,290 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + * Mellanox watchdog driver
> > + *
> > + * Copyright (C) 2019 Mellanox Technologies
> > + * Copyright (C) 2019 Michael Shych <mshych@xxxxxxxxxxxx>
> > + */
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/device.h>
> > +#include <linux/errno.h>
> > +#include <linux/log2.h>
> > +#include <linux/module.h>
> > +#include <linux/platform_data/mlxreg.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/regmap.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/types.h>
> > +#include <linux/watchdog.h>
> > +
> > +#define MLXREG_WDT_CLOCK_SCALE		1000
> > +#define MLXREG_WDT_MAX_TIMEOUT_TYPE1	32
> > +#define MLXREG_WDT_MAX_TIMEOUT_TYPE2	255
> > +#define MLXREG_WDT_MIN_TIMEOUT		1
> > +#define MLXREG_WDT_OPTIONS_BASE (WDIOF_KEEPALIVEPING |
> WDIOF_MAGICCLOSE | \
> > +				 WDIOF_SETTIMEOUT)
> > +
> > +/**
> > + * struct mlxreg_wdt - wd private data:
> > + *
> > + * @wdd:	watchdog device;
> > + * @device:	basic device;
> > + * @pdata:	data received from platform driver;
> > + * @regmap:	register map of parent device;
> > + * @timeout:	defined timeout in sec.;
> > + * @action_idx:	index for direct access to action register;
> > + * @timeout_idx:index for direct access to TO register;
> > + * @tleft_idx:	index for direct access to time left register;
> > + * @ping_idx:	index for direct access to ping register;
> > + * @reset_idx:	index for direct access to reset cause register;
> > + * @wd_type:	watchdog HW type;
> > + */
> > +struct mlxreg_wdt {
> > +	struct watchdog_device wdd;
> > +	struct mlxreg_core_platform_data *pdata;
> > +	void *regmap;
> > +	int action_idx;
> > +	int timeout_idx;
> > +	int tleft_idx;
> > +	int ping_idx;
> > +	int reset_idx;
> > +	enum mlxreg_wdt_type wdt_type;
> > +};
> > +
> > +static void mlxreg_wdt_check_card_reset(struct mlxreg_wdt *wdt)
> > +{
> > +	struct mlxreg_core_data *reg_data;
> > +	u32 regval;
> > +	int rc;
> > +
> > +	if (wdt->reset_idx == -EINVAL)
> > +		return;
> > +
> > +	if (!(wdt->wdd.info->options & WDIOF_CARDRESET))
> > +		return;
> > +
> > +	reg_data = &wdt->pdata->data[wdt->reset_idx];
> > +	rc = regmap_read(wdt->regmap, reg_data->reg, &regval);
> > +	if (!rc) {
> > +		if (regval & ~reg_data->mask) {
> > +			wdt->wdd.bootstatus = WDIOF_CARDRESET;
> > +			dev_info(wdt->wdd.parent,
> > +				 "watchdog previously reset the CPU\n");
> > +		}
> > +	}
> > +}
> > +
> > +static int mlxreg_wdt_start(struct watchdog_device *wdd)
> > +{
> > +	struct mlxreg_wdt *wdt = watchdog_get_drvdata(wdd);
> > +	struct mlxreg_core_data *reg_data = &wdt->pdata->data[wdt-
> >action_idx];
> > +
> > +	return regmap_update_bits(wdt->regmap, reg_data->reg,
> ~reg_data->mask,
> > +				  BIT(reg_data->bit));
> > +}
> > +
> > +static int mlxreg_wdt_stop(struct watchdog_device *wdd)
> > +{
> > +	struct mlxreg_wdt *wdt = watchdog_get_drvdata(wdd);
> > +	struct mlxreg_core_data *reg_data = &wdt->pdata->data[wdt-
> >action_idx];
> > +
> > +	return regmap_update_bits(wdt->regmap, reg_data->reg,
> ~reg_data->mask,
> > +				  ~BIT(reg_data->bit));
> > +}
> > +
> > +static int mlxreg_wdt_ping(struct watchdog_device *wdd)
> > +{
> > +	struct mlxreg_wdt *wdt = watchdog_get_drvdata(wdd);
> > +	struct mlxreg_core_data *reg_data = &wdt->pdata->data[wdt-
> >ping_idx];
> > +
> > +	return regmap_update_bits_base(wdt->regmap, reg_data->reg,
> > +				       ~reg_data->mask, BIT(reg_data->bit),
> > +				       NULL, false, true);
> > +}
> > +
> > +static int mlxreg_wdt_set_timeout(struct watchdog_device *wdd,
> > +				  unsigned int timeout)
> > +{
> > +	struct mlxreg_wdt *wdt = watchdog_get_drvdata(wdd);
> > +	struct mlxreg_core_data *reg_data = &wdt->pdata->data[wdt-
> >timeout_idx];
> > +	u32 regval, set_time, hw_timeout;
> > +	int rc;
> > +
> > +	if (wdt->wdt_type == MLX_WDT_TYPE1) {
> > +		rc = regmap_read(wdt->regmap, reg_data->reg, &regval);
> > +		if (rc)
> > +			return rc;
> > +
> > +		hw_timeout = order_base_2(timeout *
> MLXREG_WDT_CLOCK_SCALE);
> > +		regval = (regval & reg_data->mask) | hw_timeout;
> > +		/* Rowndown to actual closest number of sec. */
> > +		set_time = BIT(hw_timeout) / MLXREG_WDT_CLOCK_SCALE;
> > +	} else {
> > +		set_time = timeout;
> > +		regval = timeout;
> > +	}
> > +
> > +	wdd->timeout = set_time;
> > +	rc = regmap_write(wdt->regmap, reg_data->reg, regval);
> > +
> > +	if (!rc) {
> > +		/*
> > +		 * Restart watchdog with new timeout period
> > +		 * if watchdog is already started.
> > +		 */
> > +		if (watchdog_active(wdd)) {
> > +			rc = mlxreg_wdt_stop(wdd);
> > +			if (!rc)
> > +				rc = mlxreg_wdt_start(wdd);
> > +		}
> > +	}
> > +
> > +	return rc;
> > +}
> > +
> > +static unsigned int mlxreg_wdt_get_timeleft(struct watchdog_device
> *wdd)
> > +{
> > +	struct mlxreg_wdt *wdt = watchdog_get_drvdata(wdd);
> > +	struct mlxreg_core_data *reg_data = &wdt->pdata->data[wdt-
> >tleft_idx];
> > +	u32 regval;
> > +	int rc;
> > +
> > +	rc = regmap_read(wdt->regmap, reg_data->reg, &regval);
> > +
> > +	return rc == 0 ? regval : 0;
> 
> Just for clarification: If reading the time left returns an error, you don't
> want to tell userspace, but instead claim that there is no remaining timeout ?
> If so, please add a comment along that line describing the reasoning behind
> it.

Added comment about return value.

> 
> > +}
> > +
> > +static const struct watchdog_ops mlxreg_wdt_ops_type1 = {
> > +	.start		= mlxreg_wdt_start,
> > +	.stop		= mlxreg_wdt_stop,
> > +	.ping		= mlxreg_wdt_ping,
> > +	.set_timeout	= mlxreg_wdt_set_timeout,
> > +	.owner		= THIS_MODULE,
> > +};
> > +
> > +static const struct watchdog_ops mlxreg_wdt_ops_type2 = {
> > +	.start		= mlxreg_wdt_start,
> > +	.stop		= mlxreg_wdt_stop,
> > +	.ping		= mlxreg_wdt_ping,
> > +	.set_timeout	= mlxreg_wdt_set_timeout,
> > +	.get_timeleft	= mlxreg_wdt_get_timeleft,
> > +	.owner		= THIS_MODULE,
> > +};
> > +
> > +static const struct watchdog_info mlxreg_wdt_main_info = {
> > +	.options	= MLXREG_WDT_OPTIONS_BASE
> > +			| WDIOF_CARDRESET,
> > +	.identity	= "mlx-wdt-main",
> > +};
> > +
> > +static const struct watchdog_info mlxreg_wdt_aux_info = {
> > +	.options	= MLXREG_WDT_OPTIONS_BASE
> > +			| WDIOF_ALARMONLY,
> > +	.identity	= "mlx-wdt-aux",
> > +};
> > +
> > +static void mlxreg_wdt_config(struct mlxreg_wdt *wdt,
> > +			      struct mlxreg_core_platform_data *pdata)
> > +{
> > +	struct mlxreg_core_data *data = pdata->data;
> > +	int i;
> > +
> > +	wdt->reset_idx = -EINVAL;
> > +	for (i = 0; i < pdata->counter; i++, data++) {
> > +		if (strnstr(data->label, "action", sizeof(data->label)))
> > +			wdt->action_idx = i;
> > +		else if (strnstr(data->label, "timeout", sizeof(data->label)))
> > +			wdt->timeout_idx = i;
> > +		else if (strnstr(data->label, "timeleft", sizeof(data->label)))
> > +			wdt->tleft_idx = i;
> > +		else if (strnstr(data->label, "ping", sizeof(data->label)))
> > +			wdt->ping_idx = i;
> > +		else if (strnstr(data->label, "reset", sizeof(data->label)))
> > +			wdt->reset_idx = i;
> > +	}
> > +
> > +	wdt->pdata = pdata;
> > +	if (strnstr(pdata->identity, mlxreg_wdt_main_info.identity,
> > +		    sizeof(mlxreg_wdt_main_info.identity)))
> > +		wdt->wdd.info = &mlxreg_wdt_main_info;
> > +	else
> > +		wdt->wdd.info = &mlxreg_wdt_aux_info;
> > +
> > +	wdt->wdt_type = pdata->version;
> > +	if (wdt->wdt_type == MLX_WDT_TYPE2) {
> > +		wdt->wdd.ops = &mlxreg_wdt_ops_type2;
> > +		wdt->wdd.max_timeout =
> MLXREG_WDT_MAX_TIMEOUT_TYPE2;
> > +	} else {
> > +		wdt->wdd.ops = &mlxreg_wdt_ops_type1;
> > +		wdt->wdd.max_timeout =
> MLXREG_WDT_MAX_TIMEOUT_TYPE1;
> > +	}
> > +	wdt->wdd.min_timeout = MLXREG_WDT_MIN_TIMEOUT;
> > +}
> > +
> > +static int mlxreg_wdt_init_timeout(struct mlxreg_wdt *wdt,
> > +				   struct mlxreg_core_platform_data *pdata)
> > +{
> > +	u32 timeout;
> > +
> > +	timeout = pdata->data[wdt->timeout_idx].health_cntr;
> > +	return mlxreg_wdt_set_timeout(&wdt->wdd, timeout);
> > +}
> > +
> > +static int mlxreg_wdt_probe(struct platform_device *pdev)
> > +{
> > +	struct mlxreg_core_platform_data *pdata;
> > +	struct mlxreg_wdt *wdt;
> > +	int rc;
> > +
> > +	pdata = dev_get_platdata(&pdev->dev);
> > +	if (!pdata) {
> > +		dev_err(&pdev->dev, "Failed to get platform data.\n");
> > +		return -EINVAL;
> > +	}
> > +	wdt = devm_kzalloc(&pdev->dev, sizeof(*wdt), GFP_KERNEL);
> > +	if (!wdt)
> > +		return -ENOMEM;
> > +
> > +	wdt->wdd.parent = &pdev->dev;
> > +	wdt->regmap = pdata->regmap;
> > +	mlxreg_wdt_config(wdt, pdata);
> > +
> > +	if ((pdata->features & MLXREG_CORE_WD_FEATURE_NOWAYOUT))
> > +		watchdog_set_nowayout(&wdt->wdd,
> WATCHDOG_NOWAYOUT);
> > +	watchdog_stop_on_reboot(&wdt->wdd);
> > +	watchdog_stop_on_unregister(&wdt->wdd);
> > +	watchdog_set_drvdata(&wdt->wdd, wdt);
> > +	rc = mlxreg_wdt_init_timeout(wdt, pdata);
> > +	if (rc)
> > +		goto register_error;
> > +
> > +	if ((pdata->features &
> MLXREG_CORE_WD_FEATURE_START_AT_BOOT)) {
> > +		rc = mlxreg_wdt_start(&wdt->wdd);
> > +		if (rc)
> > +			goto register_error;
> > +		set_bit(WDOG_HW_RUNNING, &wdt->wdd.status);
> 
> As mentioned separately, this causes the watchdog core to ping the
> watchdog
> until the watchdog device is opened. If that is not what you want,
> it is ok to drop the above line, but please add a comment stating that
> you want the watchdog to reset the system if the watchdog device is not
> opened in time.
> 

Yes. It's safer to set it here.
I also added this explanation to mlx-wdt documentation.

> > +	}
> > +	mlxreg_wdt_check_card_reset(wdt);
> > +	rc = devm_watchdog_register_device(&pdev->dev, &wdt->wdd);
> > +
> > +register_error:
> > +	if (rc)
> > +		dev_err(&pdev->dev,
> > +			"Cannot register watchdog device (err=%d)\n", rc);
> > +	return rc;
> > +}
> > +
> > +static struct platform_driver mlxreg_wdt_driver = {
> > +	.probe	= mlxreg_wdt_probe,
> > +	.driver	= {
> > +			.name = "mlx-wdt",
> > +	},
> > +};
> > +
> > +module_platform_driver(mlxreg_wdt_driver);
> > +
> > +MODULE_AUTHOR("Michael Shych <michaelsh@xxxxxxxxxxxx>");
> > +MODULE_DESCRIPTION("Mellanox watchdog driver");
> > +MODULE_LICENSE("GPL");
> > +MODULE_ALIAS("platform:mlx-wdt");
> >

Guenter, you sent reviewed-by to regmap.h from patchset v2,
however, it was changed since then.
Could you please resend it?

Thanks, Michael.




[Index of Archives]     [Linux Kernel Development]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux