Re: [PATCH v2 2/3] drivers: watchdog: Add StarFive Watchdog driver

Guenter Roeck <linux@xxxxxxxxxxxx> · Mon, 13 Feb 2023 07:26:29 -0800

On 2/13/23 05:29, Xingyu Wu wrote:
On 2023/2/10 23:28, Guenter Roeck wrote:
On 2/9/23 23:01, Xingyu Wu wrote:
On 2023/2/2 6:46, Guenter Roeck wrote:
On Mon, Dec 19, 2022 at 05:42:32PM +0800, Xingyu Wu wrote:
Add watchdog driver for the StarFive JH7110 SoC.

Signed-off-by: Xingyu Wu <xingyu.wu@xxxxxxxxxxxxxxxx>

[...]

+MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default="
+         __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
+MODULE_PARM_DESC(soft_noboot,
+         "Watchdog action, set to 1 to ignore reboots, 0 to reboot (default 0)");

I do not understand what this module parameter is supposed to be used for,
and what the "soft_' prefix is supposed to mean.

This 'soft_noboot' means watchdog reset enabled status. If 'soft_noboot' is set to 1,
it means reset is disabled and do not reboot.Or 'reboot_disbled' instead?

That means it does nothing ? Why load the watchdog in the first place then ?

This feature is used for debugging, so user can check the counter repeatedly
without rebooting the system.

Debug options are not acceptable as module options. Please make it a define
if you think you need it.

[...]
+
+/* interrupt status whether has been raised from the counter */
+static bool starfive_wdt_raise_irq_status(struct starfive_wdt *wdt)
+{
+    return !!readl(wdt->base + wdt->drv_data->irq_is_raise);
+}
+
+static void starfive_wdt_int_clr(struct starfive_wdt *wdt)
+{
+    writel(STARFIVE_WDT_INTCLR, wdt->base + wdt->drv_data->int_clr);
+}

There is no explanation for this interrupt handling (or, rather,
non-handling since there is no interrupt handler. What is the purpose
of even having all this code ?

Because the watchdog raise an interrupt signal on the hardware when timeout,
although we do not use interrupt handler on the sorfware, but the watchdog
initialization or reload also need to clear the hardware interrupt signal.

That should be documented.

I will add that in the comments.

[...]
+
+    /*
+     * This watchdog takes twice timeouts to reset.
+     * In order to reduce time to reset, should set half count value.
+     */
+    count = timeout * freq / 2;
+
+    if (count > STARFIVE_WDT_MAXCNT) {

count is an unsigned int, which is pretty much everywhere a 32-bit
value. STARFIVE_WDT_MAXCNT is 0xffffffff. How is an unsigned integer
ever supposed to be larger than 0xffffffff ?

+        dev_warn(wdt->dev, "timeout %d too big,use the MAX-timeout set.\n",
+             timeout);
+        timeout = starfive_wdt_max_timeout(wdt);
+        count = timeout * freq;

This is confusing. First, the timeout range is checked by the calling code,
which makes sure it is never larger than max_timeout. max_timeout is
set to the value returned by starfive_wdt_max_timeout().
Thus, count = timeout * freq / 2 will _never_ be larger than
STARFIVE_WDT_MAXCNT. Even if it ever was, it doesn't make sense
to use a count value of timeout * freq in that case, ie a value which
could be twice as large as the supposed maximum value.

Change 'count' type to 'u64'. And if 'count' is larger than STARFIVE_WDT_MAXCNT,
'count' is equal to STARFIVE_WDT_MAXCNT. Does that seem OK?

That would not change anything. This is not about overflows; I would
have mentioned that. count can still never be larger than STARFIVE_WDT_MAXCNT.
Please do the math.

I get your point. It has checked the 'count' before the ops of 'set_timeout' so
I check the 'count' uselessly in this. I will remove this.

[...]
+
+    if (tmr_atboot && started == 0) {
+        starfive_wdt_start(&wdt->wdt_device);
+    } else if (!tmr_atboot) {
+        /*
+         *if we're not enabling the watchdog, then ensure it is
+         * disabled if it has been left running from the bootloader
+         * or other source.
+         */
+        starfive_wdt_stop(&wdt->wdt_device);

If it _is_ running from the boot loader, the watchdog core is not
informed about it. If it is started here, it is not informed either.
This is unusual and will need to be explained.
Why ?

Is is okay to remove the 'started'?

Yes, though of course it would be better if the watchdog is kept running
in that situation and the watchdog core is informed about it. Also,
the watchdog core is still not informed that the watchdog is running
(i.e., WDOG_HW_RUNNING is not set) when it is explicitly started.

Will remove the 'started'.

+    }
+
+#ifdef CONFIG_PM
+    clk_disable_unprepare(wdt->core_clk);
+    clk_disable_unprepare(wdt->apb_clk);
+#endif

I do not understand the above code. Why stop the watchdog if CONFIG_PM
is enabled, even if it is supposed to be running ?

There misses a check about 'early_enable' and add 'if (!early_enable)'.
There means that disable clock when watchdog sleep and CONFIG_PM is enable.
Then enable clock when watchdog work by 'starfive_wdt_runtime_resume' function.

I am quite sure that you are supposed to use pm functions for that purpose,
such as pm_runtime_get_sync(), pm_runtime_put_sync(), and pm_runtime_enable(),
similar to the code in omap_wdt.c.

I will use pm_runtime_get_sync() and pm_runtime_put_sync() to operate clocks.

[...]
+#ifdef CONFIG_PM_SLEEP
+static int starfive_wdt_suspend(struct device *dev)
+{
+    int ret;
+    struct starfive_wdt *wdt = dev_get_drvdata(dev);
+
+    starfive_wdt_unlock(wdt);
+
+    /* Save watchdog state, and turn it off. */
+    wdt->reload = starfive_wdt_get_count(wdt);
+
+    starfive_wdt_mask_and_disable_reset(wdt, true);
+
+    /* Note that WTCNT doesn't need to be saved. */
+    starfive_wdt_stop(&wdt->wdt_device);
+    pm_runtime_force_suspend(dev);
+
+    starfive_wdt_lock(wdt);
+
+    return 0;
+}
+
+static int starfive_wdt_resume(struct device *dev)
+{
+    int ret;
+    struct starfive_wdt *wdt = dev_get_drvdata(dev);
+
+    starfive_wdt_unlock(wdt);
+
+    pm_runtime_force_resume(dev);
+
+    /* Restore watchdog state. */
+    starfive_wdt_set_reload_count(wdt, wdt->reload);
+
+    starfive_wdt_restart(&wdt->wdt_device, 0, NULL);

I do not understand this call. Per its definition it is a restart handler,
supposed to restart the hardware. Why would anyone want to restart the
system on resume ?

The watchdog start counting from 'count' to 0 on everytime resume like a restart.
So I directly use a restart.

That doesn't answer my question. The "restart" callback resets the hardware.
starfive_wdt_restart() is registered as restart handler, and thus expected
to reset the hardware. It it doesn't reset the hardware, it should not
register itself as restart handler. If it does restart the hardware, calling
it on resume would automatically reset the system on each resume.
Something is wrong here, and will have to be fixed.

I _suspect_ that you think that the restart callback is supposed to reset
the watchdog. That would be wrong. It resets (restarts) the hardware,
not the watchdog. Please read Documentation/watchdog/watchdog-kernel-api.rst
if there are questions about this callback.

Thanks you for reminding me. I finally understand that the restart callback is supposed
to reset the hardware not watchdog. This watchdog doesn't reset the hardware, and I will
remove that. By the way, if I don't need restart callback, would I still use the
'watchdog_set_restart_priority' function? Thanks.

No.

Thanks,
Guenter

Best regards,
Xingyu Wu