Re: [PATCH 2/2] watchdog: Add Realtek Otto watchdog timer

Sander Vanheule <sander@xxxxxxxxxxxxx> · Thu, 14 Oct 2021 12:26:59 +0200

On Wed, 2021-10-13 at 14:03 -0700, Guenter Roeck wrote:
> On 10/13/21 12:46 PM, Sander Vanheule wrote:
> > On Wed, 2021-10-13 at 11:48 -0700, Guenter Roeck wrote:
> > > On Wed, Oct 13, 2021 at 03:29:00PM +0200, Sander Vanheule wrote:
> > [...]
> > 
> > > > 
> > > > diff --git a/drivers/watchdog/realtek_otto_wdt.c
> > > > b/drivers/watchdog/realtek_otto_wdt.c
> > > > new file mode 100644
> > > > index 000000000000..64c9cba6b0b1
> > > > --- /dev/null
> > > > +++ b/drivers/watchdog/realtek_otto_wdt.c
> > > > @@ -0,0 +1,411 @@
> > > > +// SPDX-License-Identifier: GPL-2.0-only
> > > > +
> > > > +/*
> > > > + * Realtek Otto MIPS platform watchdog
> > > > + *
> > > > + * Watchdog timer that will reset the system after timeout, using the
> > > > selected
> > > > + * reset mode.
> > > > + *
> > > > + * Counter scaling and timeouts:
> > > > + * - Base prescale of (2 << 25), providing tick duration T_0: 168ms @
> > > > 200MHz
> > > > + * - PRESCALE: logarithmic prescaler adding a factor of {1, 2, 4, 8}
> > > > + * - Phase 1: Times out after (PHASE1 + 1) × PRESCALE × T_0
> > > > + *   Generates an interrupt, WDT cannot be stopped after phase 1
> > > > + * - Phase 2: starts after phase 1, times out after (PHASE2 + 1) ×
> > > > PRESCALE × T_0
> > > > + *   Resets the system according to RST_MODE
> > > 
> > > Why is there a phase2 interrupt if phase2 resets the chip ?
> > > 
> > 
> > The SoC's reset controller has an interrupt line for phase2, even though
> > then it then the
> > WDT also resets the system. I don't have any documentation about this
> > peripheral; just
> > some vendor code and there the phase2 interrupt isn't enabled. I mainly
> > added it here for
> > completeness.
> > 
> 
> It seems pointless to mandate an interrupt just for completeness.

Okay, then I will just drop it here. As I understand, the bindings should be as
complete as possible, so I think the phase2 interrupt definition should remain
there?

> 
> > One thing to note is that after CPU or software reset (not SoC reset) the
> > phase2 flag in
> > OTTO_WDT_REG_INTR will be set. That's why I always clear it in
> > otto_wdt_probe(), because
> > otherwise enabling the interrupt line would trigger otto_wdt_phase2_isr().
> > On warm
> > restarts this bit could be used to determine if there was a WDT timeout, but
> > not if the
> > WDT is configured for cold restarts (i.e. full SoC reset).
> > 
> > > 
> > [...]
> > > > +
> > > > +       raw_spin_lock_irqsave(&ctrl->lock, flags);
> > > > +       v = ioread32(ctrl->base + OTTO_WDT_REG_CTRL);
> > > > +       v |= OTTO_WDT_CTRL_ENABLE;
> > > > +       iowrite32(v, ctrl->base + OTTO_WDT_REG_CTRL);
> > > > +       raw_spin_unlock_irqrestore(&ctrl->lock, flags);
> > > 
> > > Is it really necessary to disable interrupts for those operations ?
> > 
> > The ISR routines only use REG_INTR, which isn't modified anywhere else
> > (outside of probing
> > the device). I will replace these with raw_spin_{lock,unlock} throughout.
> > 
> 
> In that case you should not need any locks at all since the watchdog core
> ensures
> that the device is opened only once (and thus only one entity can enable or
> disable
> the watchdog).

If there is an external guarantee that at most one of {set_timeout,
set_pretimeout, enable, disable} will be called at a time, I can indeed drop the
lock. I had added the lock initially because of the read-modify-write operations
on the control register these ops perform.

> 
> > [...]
> > > > +/*
> > > > + * The timer asserts the PHASE1/PHASE2 IRQs when the number of ticks
> > > > exceeds
> > > > + * the value stored in those fields. This means the timer will run for
> > > > at least
> > > > + * one tick, so small values need to be clamped to correctly reflect
> > > > the timeout.
> > > > + */
> > > > +static inline unsigned int div_round_ticks(unsigned int val, unsigned
> > > > int
> > > > tick_duration,
> > > > +               unsigned int min_ticks)
> > > > +{
> > > > +       return max(min_ticks, DIV_ROUND_CLOSEST(val, tick_duration));
> > > 
> > > Are you sure that DIV_ROUND_CLOSEST is appropriate in those calculations
> > > (instead of DIV_ROUND_UP or DIV_ROUND_DOWN) ?
> > > 
> > [...]
> > 
> > > > +
> > > > +       timeout_ms = total_ticks * tick_ms;
> > > > +       ctrl->wdev.timeout = DIV_ROUND_CLOSEST(timeout_ms, 1000);
> > > > +
> > > 
> > > That means the actual timeout (and pretimeout) can be slightly larger
> > > than the real timeout. Is this really what you want ?
> > 
> > Is it a problem if the WDT times out later than specified by
> > watchdog_device.(pre)timeout?
> > I can see that premature timeouts would be an issue, but I don't suppose
> > it's problematic
> > if the WDT is pinged (slightly) sooner than actually required?
> > 
> 
> I am not concerned with early pings. However, if the timeout limit is set to a
> value
> lardger than the real timeout (eg the real timeout is 25.6 seconds and the
> timeout
> value is set to 26 seconds), the reset may occur a bit early. Granted, it
> doesn't
> matter much, but most driver authors would ensure that the timeout is set to
> 25 seconds
> (ie rounded down) in that situation.

I'll replace tick rounding with DIV_ROUND_UP, and timeout rounding with regular
flooring division. This results in a few timeout values being rounded up for the
coarsest tick duration, but those are then stable values.

Best,
Sander

> 
> > The coarsest ticks are 1342 ms, so it is not always possible to provide the
> > requested
> > (pre)timeout value, independent of the rounding scheme. Although I think it
> > should be
> > possible to replace timeout rounding by DIV_ROUND_UP (of total_ticks_ms),
> > and pretimeout
> > rounding by DIV_ROUND_DOWN (of phase2_ticks_ms), and keep stable timeouts
> > when alternating
> > between set_timeout/set_pretimeout.
> > 
> > > 
> > > > +       pretimeout_ms = phase2_ticks * tick_ms;
> > > > +       ctrl->wdev.pretimeout = DIV_ROUND_CLOSEST(pretimeout_ms, 1000);
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static int otto_wdt_set_timeout(struct watchdog_device *wdev, unsigned
> > > > int val)
> > > > +{
> > > > +       struct otto_wdt_ctrl *ctrl = watchdog_get_drvdata(wdev);
> > > > +       unsigned long flags;
> > > > +       unsigned int ret;
> > > > +
> > > > +       if (watchdog_timeout_invalid(wdev, val))
> > > > +               return -EINVAL;
> > > 
> > > This is not supposed to happen because the calling code already performs
> > > range checks.
> > 
> > Right, I will drop the redundant check here and in set_pretimeout.
> > 
> > > 
> > [...]
> > > > +static int otto_wdt_restart(struct watchdog_device *wdev, unsigned long
> > > > reboot_mode,
> > > > +               void *data)
> > > > +{
> > > > +       struct otto_wdt_ctrl *ctrl = watchdog_get_drvdata(wdev);
> > > > +       u32 reset_mode;
> > > > +       u32 v;
> > > > +
> > > > +       devm_free_irq(ctrl->dev, ctrl->irq_phase1, ctrl);
> > > > +
> > > 
> > > Why is this needed (instead of, say, disabling the interrupt) ?
> > 
> > Disabling the interrupt should actually be enough. I'll replace the
> > devm_free_irq() with
> > disable_irq(). Somehow I didn't find disable_irq(), even though that was
> > what I was
> > looking for...
> > 
> > [...]
> > > > +
> > > > +       /*
> > > > +        * Since pretimeout cannot be disabled, min_timeout is twice the
> > > > +        * subsystem resolution. max_timeout is 44s at a bus clock of
> > > > 200MHz.
> > > > +        */
> > > > +       ctrl->wdev.min_timeout = 2;
> > > > +       max_tick_ms = otto_wdt_tick_ms(ctrl, OTTO_WDT_PRESCALE_MAX);
> > > > +       ctrl->wdev.max_timeout =
> > > > +               DIV_ROUND_CLOSEST(max_tick_ms *
> > > > OTTO_WDT_TIMEOUT_TICKS_MAX, 1000);
> > > 
> > > Any reason for using max_timeout instead of max_hw_heartbeat_ms ?
> > 
> > I must have missed this when I was looking at watchdog_device. Makes sense
> > to use
> > max_hw_heartbeat_ms since that reflects the actual value more accurately.
> > 
> > > 
> > 
> > Thanks for the feedback!
> > 
> > Best,
> > Sander
> > 
>