Early watchdog resets and watchdog kernel API changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We had earlier discussion about the "early_timeout_sec" device tree property that we could use to ensure the watchdog HW resets the device after the given timeout at boot up. If user space does not open the watchdog device or if kernel crash prevents user space from opening the device, there would be a reset. The discussion stopped soon after we kind of agreed that a more generic approach should be used instead of implementing the behaviour to each driver. Unfortunately the watchdog core is too limited for that as of now.

I now had some spare time and started to look at whether I could come up with a patch. I browsed through several watchdog drivers and quite many of them have the same problem they are working around: The hardware watchdog timeout is way too short to be nice to the user space. That is, the hardware may need petting maybe every 250ms, while 1 second petting interval is quite common. This is worked around similar manned in many drivers. The min_timeout and max_timeout parameters in watchdog_device structure are the timeout limits exposed to the user space. The driver itself uses different timeout limits and kernel timers are used to fill in the gap between user space and what is limited by the hardware.

So, what we could be doing is to change the watchdog kernel API to be more aware of the actual hardware constraints and take over some of the driver functionality that has been implemented over and over again many places. This also makes it easier to implement new features, such as the early_timeout_sec parameter handling discussed earlier.

The way I though it could be done is this: We need to add new hw_timeout_min and hw_timeout_max parameters in watchdog_device structure. These describe the actual hardware limitations. The current min_timeout and max_timeout parameters would then continue serving the user space limits for the watchdog, as it works out right now with a lot of drivers. If user space is using longer watchdog timeouts, the watchdog core would just use generic timer code to ping the watchdog driver to prevent the watchdog from expiring before user space timeout has expired. One question here is that why do we need to limit the user space timeout values if kernel is working around the HW constraints anyway? The watchdog core could simply satisfy any (reasonable?) timeout parameter requested by the user.

For this we would need also a new set of flags that describe the hardware capabilities. We also would need a generic function for parsing the generic watchdog device tree properties so each driver don't need to implement their own parsing for the same stuff. On non-devicetree platforms this function could use some other means for parsing the parameters, such as kernel command line or ACPI.

For this I'm proposing watchdog_init_params() function that would replace watchdog_init_timeout() call from current drivers. This function could also be used for the core to know whether a driver is converted to supply the new information about its HW capabilities and whether core should take over some of the generic watchdog behaviour from the driver. If watchdog_init_params() is not called before watchdog_register_device(), the core knows to treat the driver as before. This way drivers can be converted and cleaned up one by one and not all once. I'd start with at91sam9_wdt as that's what I have the test environment available right now.

I don't have a patch for this yet, but I'm working on it. I just thought writing this email to you will help me clear my thoughts on what I am really doing here and give me some feedback to help ensure this gets generic.

Any thoughts?

-Timo
--
To unsubscribe from this list: send the line "unsubscribe linux-watchdog" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux