RE: [PATCH] TCO Watchdog warning interrupt driver creation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The panic is only there for debug purposes at first expiry.
Watchdog timeout behavior (second expiry) is not changed.
I added the panic as a config option, enabled by default.
François-Nicolas

>From 5e1680aa0df9f49459cdc7211ba80d6934f65fbd Mon Sep 17 00:00:00 2001
From: Francois-Nicolas Muller <francois-nicolas.muller@xxxxxxxxx>
Date: Tue, 20 Jan 2015 14:55:42 +0100
Subject: [PATCH] Adding TCO watchdog warning interrupt handling.

This feature is useful to root cause watchdog expiration.
It is activated by boot parameter 'warn_irq' (disabled by default).

Upon first expiration of the TCO watchdog, a warning interrupt is fired, then
the interrupt handler dumps registers and call stack of all available cpus.
TCO watchdog reloads with 2.4 seconds timeout for second expiration.

If CONFIG_ITCO_WARN_PANIC is set, the warning interrupt also calls panic()
which notifies the panic handlers then reboots the platform, depending on
CONFIG_PANIC_TIMEOUT value :

- If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will
reset the platform if second expiration happens before TCO has been kicked
again.

- If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency
restart procedure).

- If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2
seconds delay (emergency restart procedure).

Change-Id: I48dcb9d38218c8218e35f9969f064b9d5cf316f1
Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@xxxxxxxxx>
---
 drivers/watchdog/Kconfig    |   13 +++++++++
 drivers/watchdog/iTCO_wdt.c |   66 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 79d2589..15c3807 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -674,6 +674,19 @@ config ITCO_VENDOR_SUPPORT
 	  devices. At this moment we only have additional support for some
 	  SuperMicro Inc. motherboards.
 
+config ITCO_WARNING_PANIC
+	bool "Intel TCO Timer/Watchdog panic on warning interrupt"
+	depends on ITCO_WDT
+	default y
+	---help---
+	  Force a call to panic() when TCO warning interrupt occurs.
+
+	  Warning interrupt happens if warn_irq module parameter is set and
+	  TCO timer first expires.
+
+	  If not set, only cpu backtraces are dumped, no call to panic() and
+	  no notification of panic are done.
+
 config IT8712F_WDT
 	tristate "IT8712F (Smart Guardian) Watchdog Timer"
 	depends on X86
diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index e802a54..e7c5169 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -49,6 +49,8 @@
 /* Module and version information */
 #define DRV_NAME	"iTCO_wdt"
 #define DRV_VERSION	"1.11"
+#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
+#define TCO_CLASS	DRV_NAME
 
 /* Includes */
 #include <linux/module.h>		/* For module specific items */
@@ -68,6 +70,9 @@
 #include <linux/pm.h>			/* For suspend/resume */
 #include <linux/mfd/core.h>
 #include <linux/mfd/lpc_ich.h>
+#include <linux/nmi.h>
+#include <linux/acpi.h>
+#include <acpi/actypes.h>
 
 #include "iTCO_vendor.h"
 
@@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
 	bool started;
 } iTCO_wdt_private;
 
+static const struct acpi_device_id iTCO_wdt_ids[] = {
+	{"8086229C", 0},
+	{"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
+
 /* module parameters */
 #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
 static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */
@@ -126,6 +137,13 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);
 MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
 	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
 
+static bool warn_irq;
+module_param(warn_irq, bool, 0);
+MODULE_PARM_DESC(warn_irq,
+	"Dump all cpus backtraces at first watchdog timer expiration (default=0)");
+
+static bool warn_irq_panic = CONFIG_ITCO_WARNING_PANIC;
+
 /*
  * Some TCO specific functions
  */
@@ -200,6 +218,35 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
 	return ret; /* returns: 0 = OK, -EIO = Error */
 }
 
+static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void *context)
+{
+	trigger_all_cpu_backtrace();
+	if (warn_irq_panic)
+		panic("Kernel Watchdog");
+
+	return IRQ_HANDLED;
+}
+
+static int iTCO_wdt_acpi_add(struct acpi_device *device)
+{
+	unsigned long long gpe;
+	acpi_status status;
+
+	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
+	if (ACPI_FAILURE(status))
+		return -EINVAL;
+
+	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
+					  iTCO_wdt_wirq, NULL);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	acpi_enable_gpe(NULL, gpe);
+
+	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
+	return 0;
+}
+
 static int iTCO_wdt_start(struct watchdog_device *wd_dev)
 {
 	unsigned int val;
@@ -628,6 +675,15 @@ static struct platform_driver iTCO_wdt_driver = {
 	},
 };
 
+static struct acpi_driver iTCO_wdt_acpi_driver = {
+	.name = DRV_NAME_ACPI,
+	.class = TCO_CLASS,
+	.ids = iTCO_wdt_ids,
+	.ops = {
+		.add = iTCO_wdt_acpi_add,
+	},
+};
+
 static int __init iTCO_wdt_init_module(void)
 {
 	int err;
@@ -638,12 +694,22 @@ static int __init iTCO_wdt_init_module(void)
 	if (err)
 		return err;
 
+	if (warn_irq) {
+		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
+		if (err) {
+			platform_driver_unregister(&iTCO_wdt_driver);
+			return err;
+		}
+	}
+
 	return 0;
 }
 
 static void __exit iTCO_wdt_cleanup_module(void)
 {
 	platform_driver_unregister(&iTCO_wdt_driver);
+	if (warn_irq)
+		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
 	pr_info("Watchdog Module Unloaded\n");
 }
 
-- 
1.7.9.5



-----Original Message-----
From: Guenter Roeck [mailto:linux@xxxxxxxxxxxx] 
Sent: Thursday, January 15, 2015 3:49 PM
To: Muller, Francois-nicolas
Cc: Darren Hart; 'platform-driver-x86@xxxxxxxxxxxxxxx'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@xxxxxxxxxxxxxxx; Wim Van Sebroeck
Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation

On 01/15/2015 05:27 AM, Muller, Francois-nicolas wrote:
> TCO driver is anyway auto-loaded by mfd driver lpc_ich as a sub-function of it.
>
> The aim of my patch is only to add warning interrupt support in TCO driver.
> For this it need the GPE number which is exposed by Bios in acpi tables.
> So the patch registers also the TCO driver as an acpi driver to be able to retrieve this value.
>
> The acpi code part is only required by the interrupt handling support, not needed for the loading of the driver itself (already done by lpc_ich).
>
> In this context I don't see the point of dissociating the driver part from the loading, unless the loading would need to be reverted later.
>

Ok, makes sense.

Regarding the patch itself, I'll leave it up to Wim to decide what to do.
Personally I dislike the notion of panicing as response to a watchdog timeout.

Guenter

--
To unsubscribe from this list: send the line "unsubscribe platform-driver-x86" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux