Sometimes the SoC internal watchdogs are inappropriate to restart the machine in a reliable manner. This change should help to handle more than one watchdog unit by adding a scope parameter. The framework always prefers the watchdog with the widest scope. For example a watchdog which is able to restart the whole machine (SoC + external devices) gets precedence over a watchdog which can restart the SoC only. Signed-off-by: Juergen Borleis <jbe@xxxxxxxxxxxxxx> --- Documentation/user/user-manual.rst | 1 + Documentation/user/watchdog.rst | 74 ++++++++++++++++++++++++++++++++++++++ drivers/watchdog/im28wd.c | 2 +- drivers/watchdog/imxwd.c | 2 +- drivers/watchdog/wd_core.c | 45 +++++++++++++++++++++-- include/watchdog.h | 6 ++-- 6 files changed, 123 insertions(+), 7 deletions(-) create mode 100644 Documentation/user/watchdog.rst diff --git a/Documentation/user/user-manual.rst b/Documentation/user/user-manual.rst index 0d6daee..8a32469 100644 --- a/Documentation/user/user-manual.rst +++ b/Documentation/user/user-manual.rst @@ -30,6 +30,7 @@ Contents: system-setup reset-reason system-reset + watchdog * :ref:`search` * :ref:`genindex` diff --git a/Documentation/user/watchdog.rst b/Documentation/user/watchdog.rst new file mode 100644 index 0000000..b98d11c --- /dev/null +++ b/Documentation/user/watchdog.rst @@ -0,0 +1,74 @@ +.. _watchdog_usage: + +Using a watchdog +---------------- + +Watchdogs are commonly used to prevent bad software from hanging the whole +system for ever. Sometimes a simple approach can help to make a system work +if hanging failures are happen very seldom: just restart the system and do +everything again in the same way as it was done when the system starts the +last time. + +But using a watchdog should always be the 'last resort' to keep a machine +working. The focus should still be on finding and fixing the bug ;) + +A more complex way to use a watchdog in a real-word example is when the state +frameworks comes into play. Then the watchdog's task isn't only to keep the +machine working. It also monitors the whole health of the machine including +hardware and software. Especially something like a firmware update can go +wrong: a wrong firmware was programmed (wrong release or for a different machine), +programming was incomplete due to a user intervention or power fail and so on. + +In this case the watchdog does not just restart the system if the software hangs, +it also provides 'additional' information about the firmware by this restart. +The barebox's state framework is now able to run some kind of state machine to handle +firmware updates in a correct manner: trying the new firmware once and if it fails falling +back to the previous firmware (if available) for example. + +Refer the :ref:`reset_reason` how to detect the reason why the bootloader runs. +This information can be used by the barebox's state framework. + +.. _watchdog_restart_pitfalls: + +Watchdog Pitfalls +~~~~~~~~~~~~~~~~~ + +If a watchdog triggers a machine restart it suffers from the same issues like +a regular user triggered system machine restart. Refer :ref:`system_reset_pitfalls` +for further details. +So keep this in mind when you select an available watchdog on your machine for +this task. And if you are a hardware designer keep this in mind even more, and +provide a reliable restart source for the software developers and to keep their +headache low. + +Watchdogs from the developers point of view +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Watchdogs gets registered in barebox with a scope. When you register your own +watchdog driver, check its hardware scope carefully and use one of the +definitions from the file ``include/fscope.h``. + +The list of defined scopes (defined in file ``include/fscope.h``): + +* ``FEATURE_SCOPE_UNKNOWN``: completely useless watchdog, maybe a last resort.. +* ``FEATURE_SCOPE_CPU``: this watchdog is able to restart the core CPU only. + Regarding to the issues in :ref:`system_reset_pitfalls` this kind of watchdog + seems more or less useless. +* ``FEATURE_SCOPE_SOC``: this watchdog is able to restart the whole SoC. Regarding + to the issues in :ref:`system_reset_pitfalls` this scope is more reliable, but + depends on the machine and its hardware design if is able to bring the machine + back into life under every circumstance. +* ``FEATURE_SCOPE_MACHINE``: it is able to restart the whole machine and does + the same like a real ``POR`` does. Best scope and always reliable. + +The selected scope is very important because barebox will always use +the watchdog with the best available scope. + +But that is true only for watchdogs used in barebox and as long barebox is +running. + +If an operating system runs later on, it is the task of this OS to use a watchdog +with a correct scope. Otherwise it suffers from the :ref:`system_reset_pitfalls` +as well. This is even more important if the state framework is used. That means +barebox and the operating system must use the same watchdog in order to check +and change the states correctly. diff --git a/drivers/watchdog/im28wd.c b/drivers/watchdog/im28wd.c index 5781387..d885e68 100644 --- a/drivers/watchdog/im28wd.c +++ b/drivers/watchdog/im28wd.c @@ -206,7 +206,7 @@ static int imx28_wd_probe(struct device_d *dev) /* disable the debug feature to ensure a working WD */ writel(0x00000000, priv->regs + MXS_RTC_DEBUG); - rc = watchdog_register(&priv->wd); + rc = watchdog_register(&priv->wd, FEATURE_SCOPE_SOC); if (rc != 0) goto on_error; diff --git a/drivers/watchdog/imxwd.c b/drivers/watchdog/imxwd.c index cdf3d40..f9ec57c 100644 --- a/drivers/watchdog/imxwd.c +++ b/drivers/watchdog/imxwd.c @@ -191,7 +191,7 @@ static int imx_wd_probe(struct device_d *dev) priv->dev = dev; if (IS_ENABLED(CONFIG_WATCHDOG_IMX)) { - ret = watchdog_register(&priv->wd); + ret = watchdog_register(&priv->wd, FEATURE_SCOPE_SOC); if (ret) goto on_error; } diff --git a/drivers/watchdog/wd_core.c b/drivers/watchdog/wd_core.c index 3d0cfc6..79eede3 100644 --- a/drivers/watchdog/wd_core.c +++ b/drivers/watchdog/wd_core.c @@ -13,22 +13,50 @@ */ #include <common.h> +#include <init.h> #include <command.h> #include <errno.h> #include <linux/ctype.h> #include <watchdog.h> +#include <globalvar.h> /* * Note: this simple framework supports one watchdog only. */ static struct watchdog *watchdog; +static unsigned watchdog_scope; -int watchdog_register(struct watchdog *wd) +static void watchdog_update_global_info(void) { - if (watchdog != NULL) - return -EBUSY; + const char *scope; + + switch (watchdog_scope) { + case FEATURE_SCOPE_CPU: + scope = NAME_FEATURE_SCOPE_CPU; + break; + case FEATURE_SCOPE_SOC: + scope = NAME_FEATURE_SCOPE_SOC; + break; + case FEATURE_SCOPE_MACHINE: + scope = NAME_FEATURE_SCOPE_MACHINE; + break; + default: + scope = NAME_FEATURE_SCOPE_UNKNOWN; + break; + } + globalvar_set_match("system.wd.scope", scope); +} + +int watchdog_register(struct watchdog *wd, enum f_scope scope) +{ + /* ignore a lower or same priority, it isn't a failure */ + if (scope <= watchdog_scope) + return 0; watchdog = wd; + watchdog_scope = scope; + watchdog_update_global_info(); + return 0; } EXPORT_SYMBOL(watchdog_register); @@ -39,6 +67,9 @@ int watchdog_deregister(struct watchdog *wd) return -ENODEV; watchdog = NULL; + watchdog_scope = FEATURE_SCOPE_UNKNOWN; + watchdog_update_global_info(); + return 0; } EXPORT_SYMBOL(watchdog_deregister); @@ -55,3 +86,11 @@ int watchdog_set_timeout(unsigned timeout) return watchdog->set_timeout(watchdog, timeout); } EXPORT_SYMBOL(watchdog_set_timeout); + +static int watchdog_capability_init(void) +{ + globalvar_add_simple("system.wd.scope", NAME_FEATURE_SCOPE_UNKNOWN); + + return 0; +} +coredevice_initcall(watchdog_capability_init); diff --git a/include/watchdog.h b/include/watchdog.h index 7e37b7c..9822166 100644 --- a/include/watchdog.h +++ b/include/watchdog.h @@ -13,16 +13,18 @@ #ifndef INCLUDE_WATCHDOG_H # define INCLUDE_WATCHDOG_H +#include <fscope.h> + struct watchdog { int (*set_timeout)(struct watchdog *, unsigned); }; #ifdef CONFIG_WATCHDOG -int watchdog_register(struct watchdog *); +int watchdog_register(struct watchdog *, enum f_scope); int watchdog_deregister(struct watchdog *); int watchdog_set_timeout(unsigned); #else -static inline int watchdog_register(struct watchdog *w) +static inline int watchdog_register(struct watchdog *w, enum f_scope s) { return 0; } -- 2.1.4 _______________________________________________ barebox mailing list barebox@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/barebox