Re: [PATCH 0/6] watchdog: add watchdog pretimeout framework

Guenter Roeck <linux@xxxxxxxxxxxx> · Mon, 23 Nov 2015 22:47:50 -0800

Hi Vladimir,

On 11/22/2015 04:38 PM, Vladimir Zapolskiy wrote:
Hi Guenter,

On 21.11.2015 19:13, Guenter Roeck wrote:
On 11/20/2015 11:11 PM, Vladimir Zapolskiy wrote:
The change adds a simple watchdog pretimeout framework infrastructure,
its purpose is to allow users to select a desired handling of watchdog
pretimeout events, which may be generated by a watchdog driver.

The idea of adding this kind of a framework appeared after reviewing
several attempts to add hardcoded pretimeout event handling to some
watchdog driver and after a discussion with Guenter, see
https://lkml.org/lkml/2015/11/4/346

By design every watchdog pretimeout governor may be compiled as a
kernel module, a user selects a default watchdog pretimeout
governor during compilation stage and can select another governor in
runtime.

Watchdogs with WDIOF_PRETIMEOUT capability now have two device
attributes in sysfs: read/write pretimeout_governor attribute and read
only pretimeout_available_governors attribute.

To throw a pretimeout event for further processing a watchdog driver
should call exported  watchdog_notify_pretimeout(wdd) interface.

In addition to the framework a number of simple watchdog pretimeout
governors are added for review.

Hi Vladimir,

Excellent idea. I would suggest to simplify it a bit, though.

Use only a single configuration flag, and bundle all governors together
with the framework.

the idea of having separated governors in kernel module format comes from a
need in one of my projects to create an own private kernel side governor,
bundling all of the governors together will noticeably complicate the
maintenance in my particular case.

Plus the proposed view on the framework actually repeats with minor
adjustments 3 existing governor frameworks created for cpufreq, devfreq and
thermal subsystems, please review them, if you find some time. Cpufreq and
devfreq governors can be compiled and deployed as kernel modules, thermal
governors are bound to thermal.ko, all of them are selected on kernel
compilation stage, all governors are chosen in runtime by means of sysfs
device attribute interface, still some of the governors in every of the
frameworks mentioned above are pretty small.

Hmm ... ok, I'll accept that. However, please do without the #ifdefs
in the code. Thermal manages to select the default governor in an include
file, and we should be able to do the same here as well. I prefer the
approach taken there, with a pointer to the default governor and no flag.

However, it should not be possible to unload a module if its governor
is in use. Instead of taking a governor away from a watchdog by unloading
its module, selecting a governor should increase the reference count
on a module, thus preventing it from being unloaded.

We might also want to consider loading the default governor early,
not as module. Not sure how messy that would be, though. I am a bit
concerned if a governor doesn't get to run because its module is not
loaded, even if it is the default (which is why I kind of dislike
using modules). Maybe we should force-load the default governor module
when the pretimeout code initializes, and prevent it from being unloaded.

The governor code isn't that large that it warrants
separate modules, much less separate configuration flags. Keep in mind
that this will ultimately be used by distributions, and for those an
a-b-c choice is always bad. We'll have to find something else to specify
the default governor. Maybe make panic the primary default, and support
a module parameter to change it.

Here I also repeat cpufreq and thermal design (devfreq is a bit different),
please check that default governors for cpufreq and thermal are selected on
compilation stage.

Regarding the primary default governor itself, I don't have any specific
preference, *if* the default governor can be selected on compilation stage.
Panic is fine by default, but probably not for everyone.

Ok.

I'm not closely involved in any Linux distribution development and so I'm
not familiar with any potential problems there, but why a-b-c choice can not
be always reduced to a-b (drop module tristate option)? And how do
distributions handle e.g. cpufreq governors at the moment?

I don't think we should have per-watchdog sysfs attributes to change
the governor. A global set of attributes would make more sense. Maybe
this is possible through /proc/sys/, or just set it once with a
module parameter.

I personally dislike the global setting in this particular case, /proc/sys/
is too way system wide (Greg probably will object this interface also),
module parameter setting seems to be more acceptable, but it might be less
straightforward to dynamically change the currently active governor.

Also because a system can have several independent watchdogs (my one have
three hardware watchdogs plus softdog, for example), potentially a user
wants to configure them separately, the limited functionality by means of a
global setting might be insufficient.

In my opinion watchdog pretimeout events should be coupled with the devices,
so sysfs device attribute interface is the most appropriate one among
possible interfaces.

The problem here is that there would be one governor per watchdog. I don't think
any of the other subsystems has multiple default governors. This makes it very
hard for the user to configure the system. I really don't believe that there
is value in having multiple governors for different watchdogs.

Having said that, yes, you are right, all other governors do the same.
So much for overkill ;-). Meaning even though I don't think it provides
sufficient value and will make configuration more difficult than necessary,
I'll accept your point.

As a side note, I anticipate development of watchdog sysfs device attributes
in the nearest future, I vaguely remember there were some requests to add
some attributes (set/get time left, get started/stopped status etc.). IMHO
further development of binary ioctl() interfaces to watchdogs is less user
friendly.

Yes, I already have those queued in my watchdog-next tree. I have no idea what
Wim thinks about it, though.

If a watchdog driver actually supports pretimeout
is a different question. This should simplify the code a lot,
since there would always be a well known governor to execute on
a pretimeout.

The answer depends on a design decision, should there be one pretimeout
handler for all watchdogs or separate attached handlers. As a user I vote
for improved flexibility.

I prefer simplified configuration. It would be great to have some others
chime in with their opinion before we go too far along some route.

If we have to use workqueues, it would have to run on the highest
possible priority.

Right, we have to use a workqueue, due to my project demands a work done by
a governor can sleep.

I think it would be better to determine on a
per-governor basis if a workqueue is needed (eg for userspace events).
We don't need one for panic, or for noop.

It makes sense, adding a .can_sleep flag like one defined by GPIO chips may
help.

Either that, or the governor itself implements the workqueue if needed.
But a workqueue should not be mandatory if it is not needed. I can understand
that your project may need one, but that doesn't mean that we should
risk that the "panic" governor stalls because its workqueue never runs.

Because it is an additional configuration option, I've tried to avoid it
right from the beginning, but in general I have no objections to add it.

Why would this be a configuration option (instead of a flag determined
by the governor) ?

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html