Re: [PATCH v2 2/3] drivers: watchdog: Add StarFive Watchdog driver

Xingyu Wu <xingyu.wu@xxxxxxxxxxxxxxxx> · Mon, 13 Feb 2023 21:29:10 +0800

On 2023/2/10 23:28, Guenter Roeck wrote:
> On 2/9/23 23:01, Xingyu Wu wrote:
>> On 2023/2/2 6:46, Guenter Roeck wrote:
>>> On Mon, Dec 19, 2022 at 05:42:32PM +0800, Xingyu Wu wrote:
>>>> Add watchdog driver for the StarFive JH7110 SoC.
>>>>
>>>> Signed-off-by: Xingyu Wu <xingyu.wu@xxxxxxxxxxxxxxxx>
>>>
>
>[...]
>
>>>
>>>> +MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default="
>>>> +         __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
>>>> +MODULE_PARM_DESC(soft_noboot,
>>>> +         "Watchdog action, set to 1 to ignore reboots, 0 to reboot (default 0)");
>>>
>>> I do not understand what this module parameter is supposed to be used for,
>>> and what the "soft_' prefix is supposed to mean.
>>
>> This 'soft_noboot' means watchdog reset enabled status. If 'soft_noboot' is set to 1,
>> it means reset is disabled and do not reboot.Or 'reboot_disbled' instead?
>>
> 
> That means it does nothing ? Why load the watchdog in the first place then ?
> 

This feature is used for debugging, so user can check the counter repeatedly
without rebooting the system.

>[...]
>>>> +
>>>> +/* interrupt status whether has been raised from the counter */
>>>> +static bool starfive_wdt_raise_irq_status(struct starfive_wdt *wdt)
>>>> +{
>>>> +    return !!readl(wdt->base + wdt->drv_data->irq_is_raise);
>>>> +}
>>>> +
>>>> +static void starfive_wdt_int_clr(struct starfive_wdt *wdt)
>>>> +{
>>>> +    writel(STARFIVE_WDT_INTCLR, wdt->base + wdt->drv_data->int_clr);
>>>> +}
>>>
>>> There is no explanation for this interrupt handling (or, rather,
>>> non-handling since there is no interrupt handler. What is the purpose
>>> of even having all this code ?
>>
>> Because the watchdog raise an interrupt signal on the hardware when timeout,
>> although we do not use interrupt handler on the sorfware, but the watchdog
>> initialization or reload also need to clear the hardware interrupt signal.
>>
> 
> That should be documented.
> 

I will add that in the comments.

>
>[...]
>>>> +
>>>> +    /*
>>>> +     * This watchdog takes twice timeouts to reset.
>>>> +     * In order to reduce time to reset, should set half count value.
>>>> +     */
>>>> +    count = timeout * freq / 2;
>>>> +
>>>> +    if (count > STARFIVE_WDT_MAXCNT) {
>>>
>>> count is an unsigned int, which is pretty much everywhere a 32-bit
>>> value. STARFIVE_WDT_MAXCNT is 0xffffffff. How is an unsigned integer
>>> ever supposed to be larger than 0xffffffff ?
>>>
>>>> +        dev_warn(wdt->dev, "timeout %d too big,use the MAX-timeout set.\n",
>>>> +             timeout);
>>>> +        timeout = starfive_wdt_max_timeout(wdt);
>>>> +        count = timeout * freq;
>>>
>>> This is confusing. First, the timeout range is checked by the calling code,
>>> which makes sure it is never larger than max_timeout. max_timeout is
>>> set to the value returned by starfive_wdt_max_timeout().
>>> Thus, count = timeout * freq / 2 will _never_ be larger than
>>> STARFIVE_WDT_MAXCNT. Even if it ever was, it doesn't make sense
>>> to use a count value of timeout * freq in that case, ie a value which
>>> could be twice as large as the supposed maximum value.
>>
>> Change 'count' type to 'u64'. And if 'count' is larger than STARFIVE_WDT_MAXCNT,
>> 'count' is equal to STARFIVE_WDT_MAXCNT. Does that seem OK?
>>
> 
> That would not change anything. This is not about overflows; I would
> have mentioned that. count can still never be larger than STARFIVE_WDT_MAXCNT.
> Please do the math.
> 

I get your point. It has checked the 'count' before the ops of 'set_timeout' so
I check the 'count' uselessly in this. I will remove this.

>
>[...]
>>>> +
>>>> +    if (tmr_atboot && started == 0) {
>>>> +        starfive_wdt_start(&wdt->wdt_device);
>>>> +    } else if (!tmr_atboot) {
>>>> +        /*
>>>> +         *if we're not enabling the watchdog, then ensure it is
>>>> +         * disabled if it has been left running from the bootloader
>>>> +         * or other source.
>>>> +         */
>>>> +        starfive_wdt_stop(&wdt->wdt_device);
>>>
>>> If it _is_ running from the boot loader, the watchdog core is not
>>> informed about it. If it is started here, it is not informed either.
>>> This is unusual and will need to be explained.
>>> Why ?
>>
>> Is is okay to remove the 'started'?
>>
> Yes, though of course it would be better if the watchdog is kept running
> in that situation and the watchdog core is informed about it. Also,
> the watchdog core is still not informed that the watchdog is running
> (i.e., WDOG_HW_RUNNING is not set) when it is explicitly started.
> 

Will remove the 'started'.

>>>
>>>> +    }
>>>> +
>>>> +#ifdef CONFIG_PM
>>>> +    clk_disable_unprepare(wdt->core_clk);
>>>> +    clk_disable_unprepare(wdt->apb_clk);
>>>> +#endif
>>>
>>> I do not understand the above code. Why stop the watchdog if CONFIG_PM
>>> is enabled, even if it is supposed to be running ?
>>
>> There misses a check about 'early_enable' and add 'if (!early_enable)'.
>> There means that disable clock when watchdog sleep and CONFIG_PM is enable.
>> Then enable clock when watchdog work by 'starfive_wdt_runtime_resume' function.
>>
> 
> I am quite sure that you are supposed to use pm functions for that purpose,
> such as pm_runtime_get_sync(), pm_runtime_put_sync(), and pm_runtime_enable(),
> similar to the code in omap_wdt.c.
> 

I will use pm_runtime_get_sync() and pm_runtime_put_sync() to operate clocks.

>[...]
>>>> +#ifdef CONFIG_PM_SLEEP
>>>> +static int starfive_wdt_suspend(struct device *dev)
>>>> +{
>>>> +    int ret;
>>>> +    struct starfive_wdt *wdt = dev_get_drvdata(dev);
>>>> +
>>>> +    starfive_wdt_unlock(wdt);
>>>> +
>>>> +    /* Save watchdog state, and turn it off. */
>>>> +    wdt->reload = starfive_wdt_get_count(wdt);
>>>> +
>>>> +    starfive_wdt_mask_and_disable_reset(wdt, true);
>>>> +
>>>> +    /* Note that WTCNT doesn't need to be saved. */
>>>> +    starfive_wdt_stop(&wdt->wdt_device);
>>>> +    pm_runtime_force_suspend(dev);
>>>> +
>>>> +    starfive_wdt_lock(wdt);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int starfive_wdt_resume(struct device *dev)
>>>> +{
>>>> +    int ret;
>>>> +    struct starfive_wdt *wdt = dev_get_drvdata(dev);
>>>> +
>>>> +    starfive_wdt_unlock(wdt);
>>>> +
>>>> +    pm_runtime_force_resume(dev);
>>>> +
>>>> +    /* Restore watchdog state. */
>>>> +    starfive_wdt_set_reload_count(wdt, wdt->reload);
>>>> +
>>>> +    starfive_wdt_restart(&wdt->wdt_device, 0, NULL);
>>>
>>> I do not understand this call. Per its definition it is a restart handler,
>>> supposed to restart the hardware. Why would anyone want to restart the
>>> system on resume ?
>>
>> The watchdog start counting from 'count' to 0 on everytime resume like a restart.
>> So I directly use a restart.
>>
> 
> That doesn't answer my question. The "restart" callback resets the hardware.
> starfive_wdt_restart() is registered as restart handler, and thus expected
> to reset the hardware. It it doesn't reset the hardware, it should not
> register itself as restart handler. If it does restart the hardware, calling
> it on resume would automatically reset the system on each resume.
> Something is wrong here, and will have to be fixed.
> 
> I _suspect_ that you think that the restart callback is supposed to reset
> the watchdog. That would be wrong. It resets (restarts) the hardware,
> not the watchdog. Please read Documentation/watchdog/watchdog-kernel-api.rst
> if there are questions about this callback.
> 

Thanks you for reminding me. I finally understand that the restart callback is supposed
to reset the hardware not watchdog. This watchdog doesn't reset the hardware, and I will
remove that. By the way, if I don't need restart callback, would I still use the 
'watchdog_set_restart_priority' function? Thanks.

Best regards,
Xingyu Wu