Re: [PATCH 0/6] watchdog: add watchdog pretimeout framework

Vladimir Zapolskiy <vladimir_zapolskiy@xxxxxxxxxx> · Tue, 24 Nov 2015 15:25:29 +0200

Hi Guenter,

On 24.11.2015 08:47, Guenter Roeck wrote:
> Hi Vladimir,
> 
> On 11/22/2015 04:38 PM, Vladimir Zapolskiy wrote:
>> Hi Guenter,
>>
>> On 21.11.2015 19:13, Guenter Roeck wrote:
>>> On 11/20/2015 11:11 PM, Vladimir Zapolskiy wrote:
>>>> The change adds a simple watchdog pretimeout framework infrastructure,
>>>> its purpose is to allow users to select a desired handling of watchdog
>>>> pretimeout events, which may be generated by a watchdog driver.
>>>>
>>>> The idea of adding this kind of a framework appeared after reviewing
>>>> several attempts to add hardcoded pretimeout event handling to some
>>>> watchdog driver and after a discussion with Guenter, see
>>>> https://lkml.org/lkml/2015/11/4/346
>>>>
>>>> By design every watchdog pretimeout governor may be compiled as a
>>>> kernel module, a user selects a default watchdog pretimeout
>>>> governor during compilation stage and can select another governor in
>>>> runtime.
>>>>
>>>> Watchdogs with WDIOF_PRETIMEOUT capability now have two device
>>>> attributes in sysfs: read/write pretimeout_governor attribute and read
>>>> only pretimeout_available_governors attribute.
>>>>
>>>> To throw a pretimeout event for further processing a watchdog driver
>>>> should call exported  watchdog_notify_pretimeout(wdd) interface.
>>>>
>>>> In addition to the framework a number of simple watchdog pretimeout
>>>> governors are added for review.
>>>>
>>>
>>> Hi Vladimir,
>>>
>>> Excellent idea. I would suggest to simplify it a bit, though.
>>>
>>> Use only a single configuration flag, and bundle all governors together
>>> with the framework.
>>
>> the idea of having separated governors in kernel module format comes from a
>> need in one of my projects to create an own private kernel side governor,
>> bundling all of the governors together will noticeably complicate the
>> maintenance in my particular case.
>>
>> Plus the proposed view on the framework actually repeats with minor
>> adjustments 3 existing governor frameworks created for cpufreq, devfreq and
>> thermal subsystems, please review them, if you find some time. Cpufreq and
>> devfreq governors can be compiled and deployed as kernel modules, thermal
>> governors are bound to thermal.ko, all of them are selected on kernel
>> compilation stage, all governors are chosen in runtime by means of sysfs
>> device attribute interface, still some of the governors in every of the
>> frameworks mentioned above are pretty small.
>>
> 
> Hmm ... ok, I'll accept that. However, please do without the #ifdefs
> in the code. Thermal manages to select the default governor in an include
> file, and we should be able to do the same here as well. I prefer the
> approach taken there, with a pointer to the default governor and no flag.

Ok, if CONFIG_WATCHDOG_PRETIMEOUT_DEFAULT_GOV_* are moved from distributed
governor code to a centralized location, it might slightly complicate the
maintenance of private governors on my end, but I can cope with it, I believe.

> However, it should not be possible to unload a module if its governor
> is in use. Instead of taking a governor away from a watchdog by unloading
> its module, selecting a governor should increase the reference count
> on a module, thus preventing it from being unloaded.

Ok, this can be done, moreover it will simplify the design, because I had to
keep in mind a race, if pretimeout is reported, but governor is gone before
executing a workqueue task.

> We might also want to consider loading the default governor early,
> not as module. Not sure how messy that would be, though. I am a bit
> concerned if a governor doesn't get to run because its module is not
> loaded, even if it is the default (which is why I kind of dislike
> using modules). Maybe we should force-load the default governor module
> when the pretimeout code initializes, and prevent it from being unloaded.

Right in this series you may find from Kconfig that a default governor is
always compiled into the image (WATCHDOG_PRETIMEOUT_DEFAULT_GOV_* is bool).
I think it is aligned with your vision, so I'll keep this design decision,
if you don't mind.

Practically the published code can correctly handle a situation (i.e. no
oops or unexpected execution routes), if there is no assigned governor, for
example if a default governor is converted to a module by Kconfig change and
the module is unloaded. I have no objections to strictly proclaim that at
any time there should be one governor connected to a watchdog, this will
require some code removal, and I'll do it.

>>> The governor code isn't that large that it warrants
>>> separate modules, much less separate configuration flags. Keep in mind
>>> that this will ultimately be used by distributions, and for those an
>>> a-b-c choice is always bad. We'll have to find something else to specify
>>> the default governor. Maybe make panic the primary default, and support
>>> a module parameter to change it.
>>
>> Here I also repeat cpufreq and thermal design (devfreq is a bit different),
>> please check that default governors for cpufreq and thermal are selected on
>> compilation stage.
>>
>> Regarding the primary default governor itself, I don't have any specific
>> preference, *if* the default governor can be selected on compilation stage.
>> Panic is fine by default, but probably not for everyone.
>>
> Ok.
> 
>> I'm not closely involved in any Linux distribution development and so I'm
>> not familiar with any potential problems there, but why a-b-c choice can not
>> be always reduced to a-b (drop module tristate option)? And how do
>> distributions handle e.g. cpufreq governors at the moment?
>>
>>> I don't think we should have per-watchdog sysfs attributes to change
>>> the governor. A global set of attributes would make more sense. Maybe
>>> this is possible through /proc/sys/, or just set it once with a
>>> module parameter.
>>
>> I personally dislike the global setting in this particular case, /proc/sys/
>> is too way system wide (Greg probably will object this interface also),
>> module parameter setting seems to be more acceptable, but it might be less
>> straightforward to dynamically change the currently active governor.
>>
>> Also because a system can have several independent watchdogs (my one have
>> three hardware watchdogs plus softdog, for example), potentially a user
>> wants to configure them separately, the limited functionality by means of a
>> global setting might be insufficient.
>>
>> In my opinion watchdog pretimeout events should be coupled with the devices,
>> so sysfs device attribute interface is the most appropriate one among
>> possible interfaces.
>>
> The problem here is that there would be one governor per watchdog. I don't think
> any of the other subsystems has multiple default governors. This makes it very
> hard for the user to configure the system.

There is one default governor for all devices, but a user has a possibility
to select another one in runtime for a particular .

Talking about other subsystems, for example this is what I have on my laptop:

  # find /sys/ | grep governor
  /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
  /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
  /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
  /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governors
  /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
  /sys/devices/system/cpu/cpu2/cpufreq/scaling_available_governors
  /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
  /sys/devices/system/cpu/cpu3/cpufreq/scaling_available_governors

This watchdog pretimeout framework change does something similar, but you'll
note it only if you have 4 watchdogs.

> I really don't believe that there
> is value in having multiple governors for different watchdogs.

There is such value in my opinion, different watchdogs can play different
roles in system, can have different parameters and can be managed (started,
stopped, pinged) differently. I can imagine a situation, if 2-3 watchdogs
synchronously configured to reboot a system in 60 seconds, but pretimeouts
(not a panic one) are set to 30, 20, 10 seconds.

99.9% users have only one watchdog, so decisions to have one governor for
all watchdogs or one governor for one watchdog are equal for them, no
additional configuration overhead involved, but 0.1% users will suffer from
lack of flexibility. And this 0.1% users most probably come from
industrial/medical/automotive/aerospace and other areas of safety-critical
systems.

> Having said that, yes, you are right, all other governors do the same.
> So much for overkill ;-). Meaning even though I don't think it provides
> sufficient value and will make configuration more difficult than necessary,
> I'll accept your point.
> 
>> As a side note, I anticipate development of watchdog sysfs device attributes
>> in the nearest future, I vaguely remember there were some requests to add
>> some attributes (set/get time left, get started/stopped status etc.). IMHO
>> further development of binary ioctl() interfaces to watchdogs is less user
>> friendly.
>>
> 
> Yes, I already have those queued in my watchdog-next tree. I have no idea what
> Wim thinks about it, though.

I hope he approves.

>>> If a watchdog driver actually supports phttp://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=6&ved=0ahUKEwizlOLtkKnJAhWCSYgKHdCgCN8QFghDMAU&url=http%3A%2F%2Fwww.vectorcast.com%2Fblog%2F2014%2F01%2F6-industries-where-embedded-software-testing-mission-critical&usg=AFQjCNFgXGhKc4uGpMkGMTt6osGgZgNWaA&bvm=bv.108194040,d.cGUretimeout
>>> is a different question. This should simplify the code a lot,
>>> since there would always be a well known governor to execute on
>>> a pretimeout.
>>
>> The answer depends on a design decision, should there be one pretimeout
>> handler for all watchdogs or separate attached handlers. As a user I vote
>> for improved flexibility.
>>
> I prefer simplified configuration. It would be great to have some others
> chime in with their opinion before we go too far along some route.
>

I support it. Wim, do you have an opinion?

>>> If we have to use workqueues, it would have to run on the highest
>>> possible priority.
>>
>> Right, we have to use a workqueue, due to my project demands a work done by
>> a governor can sleep.
>>
>>> I think it would be better to determine on a
>>> per-governor basis if a workqueue is needed (eg for userspace events).
>>> We don't need one for panic, or for noop.
>>
>> It makes sense, adding a .can_sleep flag like one defined by GPIO chips may
>> help.
>>
> Either that, or the governor itself implements the workqueue if needed.

If governors implement own workqueues this will result in duplicated code
and more steadfast attention from maintainers. Passing a flag is simpler,
I'll do it.

> But a workqueue should not be mandatory if it is not needed. I can understand
> that your project may need one, but that doesn't mean that we should
> risk that the "panic" governor stalls because its workqueue never runs.

Sure, "panic" governor won't sleep.

>> Because it is an additional configuration option, I've tried to avoid it
>> right from the beginning, but in general I have no objections to add it.
>>
> 
> Why would this be a configuration option (instead of a flag determined
> by the governor) ?

I used an incorrect word, here I meant a fixed "struct watchdog_governor"
definition.

--
With best wishes,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html