On 9/19/22 02:33, Jean Delvare wrote:
Hi Guenter,
A few questions from an old discussion:
On Mon, 8 Aug 2022 04:36:42 -0700, Guenter Roeck wrote:
On 8/5/22 15:07, Jean Delvare wrote:
To be honest, I'm not sold to the idea of a software-emulated
maximum timeout value above what the hardware can do, but if doing
that makes sense in certain situations, then I believe it should be
implemented as a boolean flag (named emulate_large_timeout, for
example) to complement max_timeout instead of a separate time value.
Is there a reason I'm missing, why it was not done that way?
There are watchdogs with very low maximum timeout values, sometimes less than
3 seconds. gpio-wdt is one example - some have a maximum value of 2.5 seconds.
rzn1_wd is even more extreme with a maximum of 1 second. With such low values,
accuracy is important, second-based limits are insufficient, and there is an
actual need for software timeout handling on top of hardware.
Out of curiosity, what prevents user-space itself from pinging
/dev/watchdog every 0.5 second? I assume hardware using such watchdog
devices is "special" and would be running finely tuned user-space, so
the process pinging /dev/watchdog could be given higher priority or
even real-time status to ensure it runs without delays. Is that not
sufficient?
It took us forever to get the in-kernel support stable, using the right timers
and making sure that the kernel actually executes the code fast enough. Maybe
that would work nowadays from a userspace process with the right permissions,
but I would not trust it. Then there is watchdog support in, for example,
systemd. I would not want to force users to run systemd as high priority
real-time process just to make an odd watchdog work. I also would not want to
tell people that they must not use the systemd watchdog timer to make their
watchdog work.
Also, there is no guarantee that the odd hardware with the weird watchdog hardware
is actually always used in an application where such a fast timeout is needed or
even wanted.
On top of that, the code in the kernel also now supports "ping until opened"
for systems where the watchdog is already running when the system boots.
Overall, I don't think it would be a good idea to revert the in-kernel support
of pinging watchdogs.
At the same time, there is actually a need to make timeouts milli-second based
instead of second-based, for uses such as medical devices where timeouts need
to be short and accurate. The only reason for not implementing this is that
the proposals I have seen so far (including mine) were too messy for my liking,
and I never had the time to clean it up. Reverting milli-second support would
be the completely wrong direction.
I might look into this at some point (for example as a SUSE Hackweek
project). Did you post your work somewhere? I'd like to take a look.
There was one submission from someone else if I recall correctly, but mine never
got to the point where it was submittable.
Guenter