Re: [non-pretimeout,4/7] Watchdog: introduce ARM SBSA watchdog driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Guenter,

On 12 June 2015 at 00:28, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> On Thu, Jun 11, 2015 at 01:47:29AM +0800, fu.wei@xxxxxxxxxx wrote:
>> From: Fu Wei <fu.wei@xxxxxxxxxx>
>>
>> This driver bases on linux kernel watchdog framework.
>> It supports getting timeout from parameter and FDT
>> at the driver init stage.
>> The first timeout period expires, the interrupt routine
>> got another timeout period to run panic for saving
>> system context.
>>
> Comments inline.
>
> Thanks,
> Guenter
>
>> Signed-off-by: Fu Wei <fu.wei@xxxxxxxxxx>
>> ---
>>  drivers/watchdog/Kconfig     |  11 ++
>>  drivers/watchdog/Makefile    |   1 +
>>  drivers/watchdog/sbsa_gwdt.c | 383 +++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 395 insertions(+)
>>  create mode 100644 drivers/watchdog/sbsa_gwdt.c
>>
>> diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
>> index e5e7c55..554f18a 100644
>> --- a/drivers/watchdog/Kconfig
>> +++ b/drivers/watchdog/Kconfig
>> @@ -152,6 +152,17 @@ config ARM_SP805_WATCHDOG
>>         ARM Primecell SP805 Watchdog timer. This will reboot your system when
>>         the timeout is reached.
>>
>> +config ARM_SBSA_WATCHDOG
>> +     tristate "ARM SBSA Generic Watchdog"
>> +     depends on ARM64
>> +     depends on ARM_ARCH_TIMER
>> +     select WATCHDOG_CORE
>> +     help
>> +       ARM SBSA Generic Watchdog. This watchdog has two Watchdog timeouts.
>> +       The first timeout will trigger a panic; the second timeout will
>> +       trigger a system reset.
>> +       More details: ARM DEN0029B - Server Base System Architecture (SBSA)
>> +
>           To compile this driver as module, choose M here: The module
>           will be called sbsa_gwdt.

Thanks! added it.

>
>>  config AT91RM9200_WATCHDOG
>>       tristate "AT91RM9200 watchdog"
>>       depends on SOC_AT91RM9200 && MFD_SYSCON
>> diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
>> index 5c19294..471f1b7c 100644
>> --- a/drivers/watchdog/Makefile
>> +++ b/drivers/watchdog/Makefile
>> @@ -30,6 +30,7 @@ obj-$(CONFIG_USBPCWATCHDOG) += pcwd_usb.o
>>
>>  # ARM Architecture
>>  obj-$(CONFIG_ARM_SP805_WATCHDOG) += sp805_wdt.o
>> +obj-$(CONFIG_ARM_SBSA_WATCHDOG) += sbsa_gwdt.o
>>  obj-$(CONFIG_AT91RM9200_WATCHDOG) += at91rm9200_wdt.o
>>  obj-$(CONFIG_AT91SAM9X_WATCHDOG) += at91sam9_wdt.o
>>  obj-$(CONFIG_CADENCE_WATCHDOG) += cadence_wdt.o
>> diff --git a/drivers/watchdog/sbsa_gwdt.c b/drivers/watchdog/sbsa_gwdt.c
>> new file mode 100644
>> index 0000000..1ddc10f
>> --- /dev/null
>> +++ b/drivers/watchdog/sbsa_gwdt.c
>> @@ -0,0 +1,383 @@
>> +/*
>> + * SBSA(Server Base System Architecture) Generic Watchdog driver
>> + *
>> + * Copyright (c) 2015, Linaro Ltd.
>> + * Author: Fu Wei <fu.wei@xxxxxxxxxx>
>> + *         Suravee Suthikulpanit <Suravee.Suthikulpanit@xxxxxxx>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License 2 as published
>> + * by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * Note: This SBSA Generic watchdog has two stage timeouts,
>
> s/This/The/
>
> "has two stages".
>
> I would suggest to drop "Note:", but that is up to you.

Thanks :-)  fixed it

>
>> + *       When the first timeout occurs, WS0(SPI or LPI) is triggered,
>> + *       the second timeout period(as long as the first timeout period) starts.
>
> no longer accurate if WOR is used for the second period.
>
>> + *       In WS0 interrupt routine, panic() will be called for collecting
>> + *       crashdown info.
>> + *       If system can not recover from WS0 interrupt routine, then second
>> + *       timeout occurs, WS1(reset or higher level interrupt) is triggered.
>> + *       The two timeout period can be set by WOR(32bit).
>
> The second timeout period is determined by ...
>
>> + *       WOR gives a maximum watch period of around 10s at the maximum
>> + *       system counter frequency.
>> + *       The System Counter shall run at maximum of 400MHz.
>
> "... at the maximum system counter frequency of 400 MHz.", and drop the
> last sentence.

For the second timeout period,  I have discussed with a kdump developers,
(1)10s maybe not good enough for all the case of panic + kdump, so
maybe we still need to use WCV in the second timeout period
(2)in the second timeout period, maybe we need to programme WCV for
two reason: a, trigger WS1 to reboot system ASAP; b, feed the watchdog
without cleanning WS0 flag.

WHY we want to feed the watchdog (keepalive) without cleanning WS0 flag??
REASON:
(1)if the system context is large, we may need to feed the dog until
we get all the things backed up.
(2)if system goes wrong,  WS0 triggered, then panic--> kdump. if we
feed the dog by WRR or programming WOR, WS0 flag will be cleaned. Once
system goes wrong again, then panic again.....
So this system will be in a panic--kdump--panic--kdump loop, have not
chance to reset.

So if we are in the second timeout period, we may need to always programme WCV.

>
> Please uses spaces before '('.
>
>> + *
>> + *       But If we need a larger timeout period, this driver will programme WCV
>
> s/But //
> s/this/the/
> s/programme/program/
>
>> + *       directly. That can support more than 10s timeout at the maximum
>> + *       system counter frequency.
>
> Drop the last sentence.

Thanks , fixed it

>
>> + *       More details: ARM DEN0029B - Server Base System Architecture (SBSA)
>> + *
>> + * SBSA GWDT:    |---WOR(or WCV)---WS0---WOR(or WCV)---WS1
>> + *               |-----timeout-----WS0-----timeout-----WS1
>> + */
>> +
>> +#include <linux/io.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/module.h>
>> +#include <linux/moduleparam.h>
>> +#include <linux/of.h>
>> +#include <linux/of_device.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/uaccess.h>
>> +#include <linux/watchdog.h>
>> +#include <asm/arch_timer.h>
>> +
>> +/* SBSA Generic Watchdog register definitions */
>> +/* refresh frame */
>> +#define SBSA_GWDT_WRR                                0x000
>> +
>> +/* control frame */
>> +#define SBSA_GWDT_WCS                                0x000
>> +#define SBSA_GWDT_WOR                                0x008
>> +#define SBSA_GWDT_WCV_LO                     0x010
>> +#define SBSA_GWDT_WCV_HI                     0x014
>> +
>> +/* refresh/control frame */
>> +#define SBSA_GWDT_W_IIDR                     0xfcc
>> +#define SBSA_GWDT_IDR                                0xfd0
>> +
>> +/* Watchdog Control and Status Register */
>> +#define SBSA_GWDT_WCS_EN                     BIT(0)
>> +#define SBSA_GWDT_WCS_WS0                    BIT(1)
>> +#define SBSA_GWDT_WCS_WS1                    BIT(2)
>> +
>> +/**
>> + * struct sbsa_gwdt - Internal representation of the SBSA GWDT
>> + * @wdd:             kernel watchdog_device structure
>> + * @clk:             store the System Counter clock frequency, in Hz.
>> + * @max_wor_timeout: the maximum timeout value for WOR (in seconds).
>> + * @refresh_base:    Virtual address of the watchdog refresh frame
>> + * @control_base:    Virtual address of the watchdog control frame
>> + */
>> +struct sbsa_gwdt {
>> +     struct watchdog_device  wdd;
>> +     u32                     clk;
>> +     int                     max_wor_timeout;
>> +     void __iomem            *refresh_base;
>> +     void __iomem            *control_base;
>> +};
>> +
>> +#define to_sbsa_gwdt(e) container_of(e, struct sbsa_gwdt, wdd)
>> +
>> +#define DEFAULT_TIMEOUT              30 /* seconds */
>> +
>> +static unsigned int timeout;
>> +module_param(timeout, uint, 0);
>> +MODULE_PARM_DESC(timeout,
>> +              "Watchdog timeout in seconds. (>=0, default="
>> +              __MODULE_STRING(DEFAULT_TIMEOUT) ")");
>> +
>> +static bool nowayout = WATCHDOG_NOWAYOUT;
>> +module_param(nowayout, bool, S_IRUGO);
>> +MODULE_PARM_DESC(nowayout,
>> +              "Watchdog cannot be stopped once started (default="
>> +              __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
>> +
>> +/*
>> + * help functions for accessing 64bit WCV register
>> + */
>> +static u64 sbsa_gwdt_get_wcv(struct watchdog_device *wdd)
>> +{
>> +     u32 wcv_lo, wcv_hi;
>> +     struct sbsa_gwdt *gwdt = to_sbsa_gwdt(wdd);
>> +
>> +     do {
>> +             wcv_hi = readl_relaxed(gwdt->control_base + SBSA_GWDT_WCV_HI);
>> +             wcv_lo = readl_relaxed(gwdt->control_base + SBSA_GWDT_WCV_LO);
>> +     } while (wcv_hi != readl_relaxed(gwdt->control_base +
>> +                                      SBSA_GWDT_WCV_HI));
>> +
>> +     return (((u64)wcv_hi << 32) | wcv_lo);
>> +}
>> +
>> +static void reload_timeout_to_wcv(struct watchdog_device *wdd)
>> +{
>> +     struct sbsa_gwdt *gwdt = to_sbsa_gwdt(wdd);
>> +     u64 wcv;
>> +
>> +     wcv = arch_counter_get_cntvct() + (u64)wdd->timeout * gwdt->clk;
>> +
>> +     writel_relaxed(upper_32_bits(wcv),
>> +                    gwdt->control_base + SBSA_GWDT_WCV_HI);
>> +     writel_relaxed(lower_32_bits(wcv),
>> +                    gwdt->control_base + SBSA_GWDT_WCV_LO);
>> +}
>> +
>> +static int sbsa_gwdt_set_timeout(struct watchdog_device *wdd,
>> +                              unsigned int timeout)
>> +{
>> +     struct sbsa_gwdt *gwdt = to_sbsa_gwdt(wdd);
>> +
>> +     wdd->timeout = timeout;
>> +
>> +     if (timeout <= gwdt->max_wor_timeout)
>> +             writel_relaxed(timeout * gwdt->clk,
>> +                            gwdt->control_base + SBSA_GWDT_WOR);
>> +     else
>> +             writel_relaxed(gwdt->max_wor_timeout * gwdt->clk,
>> +                            gwdt->control_base + SBSA_GWDT_WOR);
>> +
>
> This can be simplified a bit to
>         if (timeout > gwdt->max_wor_timeout)
>                 timeout = gwdt->max_wor_timeout;
>         writel_relaxed(timeout * gwdt->clk,
>                        gwdt->control_base + SBSA_GWDT_WOR);

yes, good idea, thanks , fixed

>
>> +     return 0;
>> +}
>> +
>> +static unsigned int sbsa_gwdt_get_timeleft(struct watchdog_device *wdd)
>> +{
>> +     struct sbsa_gwdt *gwdt = to_sbsa_gwdt(wdd);
>> +     u64 timeleft = sbsa_gwdt_get_wcv(wdd) - arch_counter_get_cntvct();
>> +
>> +     do_div(timeleft, gwdt->clk);
>> +
>> +     return timeleft;
>> +}
>> +
>> +static int sbsa_gwdt_keepalive(struct watchdog_device *wdd)
>> +{
>> +     struct sbsa_gwdt *gwdt = to_sbsa_gwdt(wdd);
>> +
>> +     if (wdd->timeout <= gwdt->max_wor_timeout)
>> +             /*
>> +              * Writing WRR for an explicit watchdog refresh.
>> +              * You can write anyting(like 0xc0ffee).
>> +              */
>> +             writel_relaxed(0xc0ffee, gwdt->refresh_base + SBSA_GWDT_WRR);
>> +     else
>> +             reload_timeout_to_wcv(wdd);
>> +
>> +     return 0;
>> +}
>> +
>> +static int sbsa_gwdt_start(struct watchdog_device *wdd)
>> +{
>> +     struct sbsa_gwdt *gwdt = to_sbsa_gwdt(wdd);
>> +     /* Force refresh due to hardware bug found in certain Soc. */
>
> Can you specify which SOC(s) are known to need this, and explain the bug
> a bit better ?

please ignore this, I have deleted it after discussing this with the
engineer of that chip vendor.
we don't need it now.

>
>> +     writel_relaxed(0xc0ffee, gwdt->refresh_base + SBSA_GWDT_WRR);
>> +     /* writing WCS will cause an explicit watchdog refresh */
>> +     writel_relaxed(SBSA_GWDT_WCS_EN, gwdt->control_base + SBSA_GWDT_WCS);
>> +
>> +     return sbsa_gwdt_keepalive(wdd);
>> +}
>> +
>> +static int sbsa_gwdt_stop(struct watchdog_device *wdd)
>> +{
>> +     struct sbsa_gwdt *gwdt = to_sbsa_gwdt(wdd);
>> +
>> +     writel_relaxed(0, gwdt->control_base + SBSA_GWDT_WCS);
>> +
>> +     return 0;
>> +}
>> +
>> +static irqreturn_t sbsa_gwdt_interrupt(int irq, void *dev_id)
>> +{
>> +     struct sbsa_gwdt *gwdt = (struct sbsa_gwdt *)dev_id;
>> +     struct watchdog_device *wdd = &gwdt->wdd;
>> +
>> +     if (wdd->timeout > gwdt->max_wor_timeout)
>> +             reload_timeout_to_wcv(wdd);
>> +
> Please drop the above.

as I mentioned above, I thinks we can keep this.
But please check my new patchset for this support

>
>> +     panic("SBSA Watchdog pre-timeout");
>> +
>> +     return IRQ_HANDLED;
>> +}
>> +
>> +static struct watchdog_info sbsa_gwdt_info = {
>> +     .identity       = "SBSA Generic Watchdog",
>> +     .options        = WDIOF_SETTIMEOUT |
>> +                       WDIOF_KEEPALIVEPING |
>> +                       WDIOF_MAGICCLOSE |
>> +                       WDIOF_CARDRESET,
>> +};
>> +
>> +static struct watchdog_ops sbsa_gwdt_ops = {
>> +     .owner          = THIS_MODULE,
>> +     .start          = sbsa_gwdt_start,
>> +     .stop           = sbsa_gwdt_stop,
>> +     .ping           = sbsa_gwdt_keepalive,
>> +     .set_timeout    = sbsa_gwdt_set_timeout,
>> +     .get_timeleft   = sbsa_gwdt_get_timeleft,
>> +};
>> +
>> +static int sbsa_gwdt_probe(struct platform_device *pdev)
>> +{
>> +     u64 first_period_max = U64_MAX;
>> +     struct device *dev = &pdev->dev;
>> +     struct watchdog_device *wdd;
>> +     struct sbsa_gwdt *gwdt;
>> +     struct resource *res;
>> +     void *rf_base, *cf_base;
>> +     int ret, irq;
>> +     u32 status;
>> +
>> +     gwdt = devm_kzalloc(dev, sizeof(*gwdt), GFP_KERNEL);
>> +     if (!gwdt)
>> +             return -ENOMEM;
>> +     platform_set_drvdata(pdev, gwdt);
>> +
>> +     res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "refresh");
>> +     rf_base = devm_ioremap_resource(dev, res);
>> +     if (IS_ERR(rf_base))
>> +             return PTR_ERR(rf_base);
>> +
>> +     res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "control");
>> +     cf_base = devm_ioremap_resource(dev, res);
>> +     if (IS_ERR(cf_base))
>> +             return PTR_ERR(cf_base);
>> +
>> +     irq = platform_get_irq_byname(pdev, "ws0");
>> +     if (irq < 0) {
>> +             dev_err(dev, "unable to get ws0 interrupt.\n");
>> +             return irq;
>> +     }
>> +
>> +     /*
>> +      * Get the frequency of system counter from the cp15 interface of ARM
>> +      * Generic timer. We don't need to check it, because if it returns "0",
>> +      * system would panic in very early stage.
>> +      */
>> +     gwdt->clk = arch_timer_get_cntfrq();
>> +     gwdt->refresh_base = rf_base;
>> +     gwdt->control_base = cf_base;
>> +     gwdt->max_wor_timeout = U32_MAX / gwdt->clk;
>> +
>> +     wdd = &gwdt->wdd;
>> +     wdd->parent = dev;
>> +     wdd->info = &sbsa_gwdt_info;
>> +     wdd->ops = &sbsa_gwdt_ops;
>> +     watchdog_set_drvdata(wdd, gwdt);
>> +     watchdog_set_nowayout(wdd, nowayout);
>> +
>> +     wdd->min_timeout = 1;
>> +     do_div(first_period_max, gwdt->clk);
>> +     wdd->max_timeout = first_period_max;
>> +
>> +     wdd->timeout = DEFAULT_TIMEOUT;
>> +     watchdog_init_timeout(wdd, timeout, dev);
>> +
>> +     status = readl_relaxed(gwdt->control_base + SBSA_GWDT_WCS);
>> +     if (status & SBSA_GWDT_WCS_WS1) {
>> +             dev_warn(dev, "System reset by WDT(WCV: %llx)\n",
>> +                      sbsa_gwdt_get_wcv(wdd));
>
> WCV here only tells us how many clock cycles were executed since the
> system started (or something like that). So I still don't understand
> why it is valuable to print that number.

this number provides the time of system reset, I thinks that may help
admin to analyse the system failure.

>
>> +             wdd->bootstatus |= WDIOF_CARDRESET;
>> +     }
>> +     /* Check if watchdog is already enabled */
>> +     if (status & SBSA_GWDT_WCS_EN) {
>> +             dev_warn(dev, "already enabled\n");
>> +             sbsa_gwdt_keepalive(wdd);
>> +     }
>
> Can you merge the message with the info message below ?
> Something like
>         dev_info(dev, "Initialized with %ds timeout @ %u Hz%s\n", wdd->timeout,
>                  gwdt->clk, status & SBSA_GWDT_WCS_EN ? " [enabled]" : "");
>
> I don't think that should be a warning.

yes, good idea, will do

>
>> +
>> +     /* update timeout to WOR */
>> +     sbsa_gwdt_set_timeout(wdd, wdd->timeout);
>> +
>
> That will trigger a refresh if the watchdog is active, meaning the timeout
> will occur at time + WOR, not at time + timeout. I think keepalive has to be
> called later, preferrably after calling watchdog_register_device().

yes, you are right, will fix it

>
>> +     ret = devm_request_irq(dev, irq, sbsa_gwdt_interrupt, 0,
>> +                            pdev->name, gwdt);
>> +     if (ret) {
>> +             dev_err(dev, "unable to request IRQ %d\n", irq);
>> +             return ret;
>> +     }
>> +
>> +     ret = watchdog_register_device(wdd);
>> +     if (ret)
>> +             return ret;
>> +
>> +     dev_info(dev, "Initialized with %ds timeout @ %u Hz\n", wdd->timeout,
>> +              gwdt->clk);
>> +
>> +     return 0;
>> +}
>> +
>> +static void sbsa_gwdt_shutdown(struct platform_device *pdev)
>> +{
>> +     struct sbsa_gwdt *gwdt = platform_get_drvdata(pdev);
>> +
>> +     sbsa_gwdt_stop(&gwdt->wdd);
>> +}
>> +
>> +static int sbsa_gwdt_remove(struct platform_device *pdev)
>> +{
>> +     struct sbsa_gwdt *gwdt = platform_get_drvdata(pdev);
>> +
>> +     watchdog_unregister_device(&gwdt->wdd);
>> +
>> +     return 0;
>> +}
>> +
>> +/* Disable watchdog if it is active during suspend */
>> +static int __maybe_unused sbsa_gwdt_suspend(struct device *dev)
>> +{
>> +     struct sbsa_gwdt *gwdt = dev_get_drvdata(dev);
>> +
>> +     if (watchdog_active(&gwdt->wdd))
>> +             sbsa_gwdt_stop(&gwdt->wdd);
>> +
>> +     return 0;
>> +}
>> +
>> +/* Enable watchdog and configure it if necessary */
>> +static int __maybe_unused sbsa_gwdt_resume(struct device *dev)
>> +{
>> +     struct sbsa_gwdt *gwdt = dev_get_drvdata(dev);
>> +
>> +     if (watchdog_active(&gwdt->wdd))
>> +             sbsa_gwdt_start(&gwdt->wdd);
>> +
>> +     return 0;
>> +}
>> +
>> +static const struct dev_pm_ops sbsa_gwdt_pm_ops = {
>> +     SET_SYSTEM_SLEEP_PM_OPS(sbsa_gwdt_suspend, sbsa_gwdt_resume)
>> +};
>> +
>> +static const struct of_device_id sbsa_gwdt_of_match[] = {
>> +     { .compatible = "arm,sbsa-gwdt", },
>> +     {},
>> +};
>> +MODULE_DEVICE_TABLE(of, sbsa_gwdt_of_match);
>> +
>> +static const struct platform_device_id sbsa_gwdt_pdev_match[] = {
>> +     { .name = "sbsa-gwdt", },
>> +     {},
>> +};
>> +MODULE_DEVICE_TABLE(platform, sbsa_gwdt_pdev_match);
>> +
>> +static struct platform_driver sbsa_gwdt_driver = {
>> +     .driver = {
>> +             .name = "sbsa-gwdt",
>> +             .pm = &sbsa_gwdt_pm_ops,
>> +             .of_match_table = sbsa_gwdt_of_match,
>> +     },
>> +     .probe = sbsa_gwdt_probe,
>> +     .remove = sbsa_gwdt_remove,
>> +     .shutdown = sbsa_gwdt_shutdown,
>> +     .id_table = sbsa_gwdt_pdev_match,
>> +};
>> +
>> +module_platform_driver(sbsa_gwdt_driver);
>> +
>> +MODULE_DESCRIPTION("SBSA Generic Watchdog Driver");
>> +MODULE_VERSION("v1.0");
>
> Version numbers tend to be out of date constantly, and there is no well
> defined mechanism or protocol when increase them. I would suggest to drop it.

Ok, will drop it

>
>> +MODULE_AUTHOR("Fu Wei <fu.wei@xxxxxxxxxx>");
>> +MODULE_AUTHOR("Suravee Suthikulpanit <Suravee.Suthikulpanit@xxxxxxx>");
>> +MODULE_LICENSE("GPL v2");



-- 
Best regards,

Fu Wei
Software Engineer
Red Hat Software (Beijing) Co.,Ltd.Shanghai Branch
Ph: +86 21 61221326(direct)
Ph: +86 186 2020 4684 (mobile)
Room 1512, Regus One Corporate Avenue,Level 15,
One Corporate Avenue,222 Hubin Road,Huangpu District,
Shanghai,China 200021
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux