RE: [PATCH] TCO Watchdog warning interrupt driver creation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>From 54d2ff5e13c1b35d5019b82376dabb903ebe30d6 Mon Sep 17 00:00:00 2001
From: Francois-Nicolas Muller <francois-nicolas.muller@xxxxxxxxx>
Date: Wed, 14 Jan 2015 14:27:43 +0100
Subject: [PATCH] Adding TCO watchdog warning interrupt handling.

This feature is useful to root cause watchdog expiration.
It is activated by boot parameter 'warn_irq' (disabled by default).

Upon first expiration of the TCO watchdog, a warning interrupt is fired then the
interrupt handler dumps registers and call stack of all available cores.

Finally panic() is called and notifies the panic handlers if any. At the same
time, the TCO watchdog reloads with 2.4 seconds timeout value.

When warning interrupt is enabled, platform reboot depends on
CONFIG_PANIC_TIMEOUT value :

- If CONFIG_PANIC_TIMEOUT is zero or greater than 3 seconds, TCO watchdog will
reset the platform if second expiration happens before TCO has been kicked
again.

- If CONFIG_PANIC_TIMEOUT is < 0, platform will reboot immediately (emergency
restart procedure).

- If CONFIG_PANIC_TIMEOUT is 1 or 2 seconds, platform will reboot after 1 or 2
seconds delay (emergency restart procedure).

Change-Id: I009c41f2f3dc3bd091b4d2a45b4ea0be85c8ce27
Signed-off-by: Francois-Nicolas Muller <francois-nicolas.muller@xxxxxxxxx>
---
 drivers/watchdog/iTCO_wdt.c | 64 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index e802a54..a8c16d2 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -49,6 +49,8 @@
 /* Module and version information */
 #define DRV_NAME	"iTCO_wdt"
 #define DRV_VERSION	"1.11"
+#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
+#define TCO_CLASS	DRV_NAME
 
 /* Includes */
 #include <linux/module.h>		/* For module specific items */
@@ -68,6 +70,9 @@
 #include <linux/pm.h>			/* For suspend/resume */
 #include <linux/mfd/core.h>
 #include <linux/mfd/lpc_ich.h>
+#include <linux/nmi.h>
+#include <linux/acpi.h>
+#include <acpi/actypes.h>
 
 #include "iTCO_vendor.h"
 
@@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
 	bool started;
 } iTCO_wdt_private;
 
+static const struct acpi_device_id iTCO_wdt_ids[] = {
+	{"8086229C", 0},
+	{"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
+
 /* module parameters */
 #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
 static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */
@@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);
 MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
 	"Turn off SMI clearing watchdog (depends on TCO-version)(default=1)");
 
+static bool warn_irq;
+module_param(warn_irq, bool, 0);
+MODULE_PARM_DESC(warn_irq,
+	"Watchdog trigs a panic at first expiration (default=0)");
+
 /*
  * Some TCO specific functions
  */
@@ -200,6 +216,35 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
 	return ret; /* returns: 0 = OK, -EIO = Error */
 }
 
+static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void *context)
+{
+	trigger_all_cpu_backtrace();
+	panic("Kernel Watchdog");
+
+	/* This code should not be reached */
+	return IRQ_HANDLED;
+}
+
+static int iTCO_wdt_acpi_add(struct acpi_device *device)
+{
+	unsigned long long gpe;
+	acpi_status status;
+
+	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
+	if (ACPI_FAILURE(status))
+		return -EINVAL;
+
+	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
+					  iTCO_wdt_wirq, NULL);
+	if (ACPI_FAILURE(status))
+		return -ENODEV;
+
+	acpi_enable_gpe(NULL, gpe);
+
+	pr_debug("interrupt=SCI GPE=0x%02llx", gpe);
+	return 0;
+}
+
 static int iTCO_wdt_start(struct watchdog_device *wd_dev)
 {
 	unsigned int val;
@@ -628,6 +673,15 @@ static struct platform_driver iTCO_wdt_driver = {
 	},
 };
 
+static struct acpi_driver iTCO_wdt_acpi_driver = {
+	.name = DRV_NAME_ACPI,
+	.class = TCO_CLASS,
+	.ids = iTCO_wdt_ids,
+	.ops = {
+		.add = iTCO_wdt_acpi_add,
+	},
+};
+
 static int __init iTCO_wdt_init_module(void)
 {
 	int err;
@@ -638,12 +692,22 @@ static int __init iTCO_wdt_init_module(void)
 	if (err)
 		return err;
 
+	if (warn_irq) {
+		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
+		if (err) {
+			platform_driver_unregister(&iTCO_wdt_driver);
+			return err;
+		}
+	}
+
 	return 0;
 }
 
 static void __exit iTCO_wdt_cleanup_module(void)
 {
 	platform_driver_unregister(&iTCO_wdt_driver);
+	if (warn_irq)
+		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
 	pr_info("Watchdog Module Unloaded\n");
 }
 
-- 
1.9.1


-----Original Message-----
From: Muller, Francois-nicolas 
Sent: Wednesday, January 14, 2015 5:33 PM
To: 'Guenter Roeck'; Darren Hart
Cc: 'platform-driver-x86@xxxxxxxxxxxxxxx'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@xxxxxxxxxxxxxxx; Wim Van Sebroeck
Subject: RE: [PATCH] TCO Watchdog warning interrupt driver creation

Thanks for your review, my comments embedded below.
I'm sending the new version of the patch in the following email.
François-Nicolas

>-----Original Message-----
>From: Guenter Roeck [mailto:linux@xxxxxxxxxxxx]
>Sent: Friday, January 09, 2015 6:09 AM
>To: Muller, Francois-nicolas; Darren Hart
>Cc: 'platform-driver-x86@xxxxxxxxxxxxxxx'; Rafael Wysocki; Linux ACPI 
>Mailing List; linux-watchdog@xxxxxxxxxxxxxxx; Wim Van Sebroeck
>Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation
>
>On 12/22/2014 08:07 AM, Muller, Francois-nicolas wrote:
>> Subject: [PATCH] TCO watchdog warning interrupt handler
>>
>> Adding TCO watchdog warning interrupt handling.
>>
>> This feature is useful to root cause watchdog expiration.
>> It is activated by boot parameter 'warn_irq' (disabled by default).
>>
>> Upon first expiration of the TCO watchdog, a warning interrupt is 
>> fired and the driver calls panic() to dump debug information 
>> (registers and call stacks).
>>
>> At the same time, the TCO watchdog reloads with 2.4 seconds timeout 
>> value and runs until the second expiration.
>>
>> Signed-off-by: Francois-Nicolas Muller 
>> <francois-nicolas.muller@xxxxxxxxx>
>> ---
>>   drivers/watchdog/iTCO_wdt.c |   66 +++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 66 insertions(+)
>>
>> diff --git a/drivers/watchdog/iTCO_wdt.c 
>> b/drivers/watchdog/iTCO_wdt.c index e802a54..1ce9ad8 100644
>> --- a/drivers/watchdog/iTCO_wdt.c
>> +++ b/drivers/watchdog/iTCO_wdt.c
>> @@ -49,6 +49,8 @@
>>   /* Module and version information */
>>   #define DRV_NAME	"iTCO_wdt"
>>   #define DRV_VERSION	"1.11"
>> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
>> +#define TCO_CLASS	DRV_NAME
>>
>>   /* Includes */
>>   #include <linux/module.h>		/* For module specific items */
>> @@ -68,6 +70,9 @@
>>   #include <linux/pm.h>			/* For suspend/resume */
>>   #include <linux/mfd/core.h>
>>   #include <linux/mfd/lpc_ich.h>
>> +#include <linux/nmi.h>
>> +#include <linux/acpi.h>
>> +#include <acpi/actypes.h>
>>
>>   #include "iTCO_vendor.h"
>>
>> @@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
>>   	bool started;
>>   } iTCO_wdt_private;
>>
>> +static const struct acpi_device_id iTCO_wdt_ids[] = {
>> +	{"8086229C", 0},
>> +	{"", 0},
>> +};
>> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
>> +
>
>Doesn't that result in auto-loading the driver ?

The driver is loaded only when Bios exposes the acpi device and the boot param warn_irq is true.
Could you clarify ? I don't understand the issue. 

>
>If so, if that is really desirable, it should be a separate patch.
>
>>   /* module parameters */
>>   #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>>   static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>>   	"Turn off SMI clearing watchdog (depends on 
>> TCO-version)(default=1)");
>>
>
>Really looks like the patch is corrupted, with lines merged.

Yes sorry for this, my mail reader doesn't handle well plain text.

>
>> +static bool warn_irq;
>> +module_param(warn_irq, bool, 0);
>> +MODULE_PARM_DESC(warn_irq,
>> +	"Watchdog trigs a panic at first expiration (default=0)");
>> +
>>   /*
>>    * Some TCO specific functions
>>    */
>> @@ -200,6 +216,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>>   	return ret; /* returns: 0 = OK, -EIO = Error */  }
>>
>> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void
>> +*context) {
>
>Either the patch is corrupted, or you need to look into kernel coding style rules.
>
>> +	pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", __func__);
>> +
>That is kind of redundant with the following panic.
>

Sure I will remove it.

>> +	trigger_all_cpu_backtrace();
>> +	panic("Kernel Watchdog");
>> +
>
>Quite frankly, I don't see the point of this patch. Sure, this will dump the stack. Ok, but so what ? What value would it have for the user to see the interrupt call stack that is seen if the watchdog times out ?
>
>Guess I must be missing something.
>

Trigger_all_cpu_backtrace() is useful to dump stacks of all running cpu (if any) by triggering an NMI to them.
The cause of a cpu hang may be related to another cpu and all cpu stacks are helpful information.
Panic() also dumps the interrupt call stack, which is useless here, but notifies panic handlers which could add some debug information.
I think both are useful for these reasons.

>> +	/* This code should not be reached */
>> +	return IRQ_HANDLED;
>> +}
>> +
>> +static int iTCO_wdt_acpi_add(struct acpi_device *device) {
>> +	unsigned long long gpe;
>> +	acpi_status status;
>> +
>> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
>> +	if (ACPI_FAILURE(status))
>> +		return -EINVAL;
>> +
>> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
>> +					  iTCO_wdt_wirq, NULL);
>> +	if (ACPI_FAILURE(status))
>> +		return -ENODEV;
>> +
>> +	acpi_enable_gpe(NULL, gpe);
>> +
>> +	pr_info("interrupt=SCI GPE=0x%02llx", gpe);
>
>I fail to see the value of this log message.
>

I will change it to pr_debug

>> +	return 0;
>> +}
>> +
>>   static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
>>   	unsigned int val;
>> @@ -628,6 +675,15 @@ static struct platform_driver iTCO_wdt_driver = {
>>   	},
>>   };
>>
>> +static struct acpi_driver iTCO_wdt_acpi_driver = {
>> +	.name = DRV_NAME_ACPI,
>> +	.class = TCO_CLASS,
>> +	.ids = iTCO_wdt_ids,
>> +	.ops = {
>> +		.add = iTCO_wdt_acpi_add,
>> +	},
>> +};
>> +
>>   static int __init iTCO_wdt_init_module(void)  {
>>   	int err;
>> @@ -638,12 +694,22 @@ static int __init iTCO_wdt_init_module(void)
>>   	if (err)
>>   		return err;
>>
>> +	if (warn_irq) {
>> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
>> +		if (err) {
>> +			platform_driver_unregister(&iTCO_wdt_driver);
>> +			return err;
>> +		}
>> +	}
>> +
>>   	return 0;
>>   }
>>
>>   static void __exit iTCO_wdt_cleanup_module(void)  {
>
>The original code is not formatted like this.
>
>>   	platform_driver_unregister(&iTCO_wdt_driver);
>> +	if (warn_irq)
>> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
>>   	pr_info("Watchdog Module Unloaded\n");  }
>>
>> --
>> 1.7.9.5
>>
>>
>> -----Original Message-----
>> From: Darren Hart [mailto:dvhart@xxxxxxxxxxxxx]
>> Sent: Monday, December 22, 2014 4:41 PM
>> To: Muller, Francois-nicolas; Guenter Roeck
>> Cc: 'platform-driver-x86@xxxxxxxxxxxxxxx'; Rafael Wysocki; Linux ACPI 
>> Mailing List; linux-watchdog@xxxxxxxxxxxxxxx; Wim Van Sebroeck
>> Subject: RE: [PATCH] TCO Watchdog warning interrupt driver creation
>>
>>
>>
>> On December 22, 2014 5:18:33 AM PST, "Muller, Francois-nicolas" <francois-nicolas.muller@xxxxxxxxx> wrote:
>>> Thanks Darren and Guenter for your review.
>>> Here is a new patchset, including the feature in the watchdog driver 
>>> and using a boot parameter to be enabled.
>>> François-Nicolas
>>>
>>> Subject: [PATCH] TCO watchdog warning interrupt handler
>>>
>>
>> Please provide a complete commit message.
>>
>>> Signed-off-by: Francois-Nicolas Muller 
>>> <francois-nicolas.muller@xxxxxxxxx>
>>
>> --
>> Darren Hart
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>

-----Original Message-----
From: Guenter Roeck [mailto:linux@xxxxxxxxxxxx]
Sent: Friday, January 09, 2015 6:09 AM
To: Muller, Francois-nicolas; Darren Hart
Cc: 'platform-driver-x86@xxxxxxxxxxxxxxx'; Rafael Wysocki; Linux ACPI Mailing List; linux-watchdog@xxxxxxxxxxxxxxx; Wim Van Sebroeck
Subject: Re: [PATCH] TCO Watchdog warning interrupt driver creation

On 12/22/2014 08:07 AM, Muller, Francois-nicolas wrote:
> Subject: [PATCH] TCO watchdog warning interrupt handler
>
> Adding TCO watchdog warning interrupt handling.
>
> This feature is useful to root cause watchdog expiration.
> It is activated by boot parameter 'warn_irq' (disabled by default).
>
> Upon first expiration of the TCO watchdog, a warning interrupt is 
> fired and the driver calls panic() to dump debug information 
> (registers and call stacks).
>
> At the same time, the TCO watchdog reloads with 2.4 seconds timeout 
> value and runs until the second expiration.
>
> Signed-off-by: Francois-Nicolas Muller 
> <francois-nicolas.muller@xxxxxxxxx>
> ---
>   drivers/watchdog/iTCO_wdt.c |   66 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 66 insertions(+)
>
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c 
> index e802a54..1ce9ad8 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -49,6 +49,8 @@
>   /* Module and version information */
>   #define DRV_NAME	"iTCO_wdt"
>   #define DRV_VERSION	"1.11"
> +#define DRV_NAME_ACPI	"iTCO_wdt_wirq"
> +#define TCO_CLASS	DRV_NAME
>
>   /* Includes */
>   #include <linux/module.h>		/* For module specific items */
> @@ -68,6 +70,9 @@
>   #include <linux/pm.h>			/* For suspend/resume */
>   #include <linux/mfd/core.h>
>   #include <linux/mfd/lpc_ich.h>
> +#include <linux/nmi.h>
> +#include <linux/acpi.h>
> +#include <acpi/actypes.h>
>
>   #include "iTCO_vendor.h"
>
> @@ -107,6 +112,12 @@ static struct {		/* this is private data for the iTCO_wdt device */
>   	bool started;
>   } iTCO_wdt_private;
>
> +static const struct acpi_device_id iTCO_wdt_ids[] = {
> +	{"8086229C", 0},
> +	{"", 0},
> +};
> +MODULE_DEVICE_TABLE(acpi, iTCO_wdt_ids);
> +

Doesn't that result in auto-loading the driver ?

If so, if that is really desirable, it should be a separate patch.

>   /* module parameters */
>   #define WATCHDOG_TIMEOUT 30	/* 30 sec default heartbeat */
>   static int heartbeat = WATCHDOG_TIMEOUT;  /* in seconds */ @@ -126,6 +137,11 @@ module_param(turn_SMI_watchdog_clear_off, int, 0);  MODULE_PARM_DESC(turn_SMI_watchdog_clear_off,
>   	"Turn off SMI clearing watchdog (depends on 
> TCO-version)(default=1)");
>

Really looks like the patch is corrupted, with lines merged.

> +static bool warn_irq;
> +module_param(warn_irq, bool, 0);
> +MODULE_PARM_DESC(warn_irq,
> +	"Watchdog trigs a panic at first expiration (default=0)");
> +
>   /*
>    * Some TCO specific functions
>    */
> @@ -200,6 +216,37 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(void)
>   	return ret; /* returns: 0 = OK, -EIO = Error */  }
>
> +static u32 iTCO_wdt_wirq(acpi_handle gpe_device, u32 gpe, void
> +*context) {

Either the patch is corrupted, or you need to look into kernel coding style rules.

> +	pr_warn("[SHTDWN] %s, WATCHDOG TIMEOUT HANDLER!\n", __func__);
> +
That is kind of redundant with the following panic.

> +	trigger_all_cpu_backtrace();
> +	panic("Kernel Watchdog");
> +

Quite frankly, I don't see the point of this patch. Sure, this will dump the stack. Ok, but so what ? What value would it have for the user to see the interrupt call stack that is seen if the watchdog times out ?

Guess I must be missing something.

> +	/* This code should not be reached */
> +	return IRQ_HANDLED;
> +}
> +
> +static int iTCO_wdt_acpi_add(struct acpi_device *device) {
> +	unsigned long long gpe;
> +	acpi_status status;
> +
> +	status = acpi_evaluate_integer(device->handle, "_GPE", NULL, &gpe);
> +	if (ACPI_FAILURE(status))
> +		return -EINVAL;
> +
> +	status = acpi_install_gpe_handler(NULL, gpe, ACPI_GPE_EDGE_TRIGGERED,
> +					  iTCO_wdt_wirq, NULL);
> +	if (ACPI_FAILURE(status))
> +		return -ENODEV;
> +
> +	acpi_enable_gpe(NULL, gpe);
> +
> +	pr_info("interrupt=SCI GPE=0x%02llx", gpe);

I fail to see the value of this log message.

> +	return 0;
> +}
> +
>   static int iTCO_wdt_start(struct watchdog_device *wd_dev)  {
>   	unsigned int val;
> @@ -628,6 +675,15 @@ static struct platform_driver iTCO_wdt_driver = {
>   	},
>   };
>
> +static struct acpi_driver iTCO_wdt_acpi_driver = {
> +	.name = DRV_NAME_ACPI,
> +	.class = TCO_CLASS,
> +	.ids = iTCO_wdt_ids,
> +	.ops = {
> +		.add = iTCO_wdt_acpi_add,
> +	},
> +};
> +
>   static int __init iTCO_wdt_init_module(void)  {
>   	int err;
> @@ -638,12 +694,22 @@ static int __init iTCO_wdt_init_module(void)
>   	if (err)
>   		return err;
>
> +	if (warn_irq) {
> +		err = acpi_bus_register_driver(&iTCO_wdt_acpi_driver);
> +		if (err) {
> +			platform_driver_unregister(&iTCO_wdt_driver);
> +			return err;
> +		}
> +	}
> +
>   	return 0;
>   }
>
>   static void __exit iTCO_wdt_cleanup_module(void)  {

The original code is not formatted like this.

>   	platform_driver_unregister(&iTCO_wdt_driver);
> +	if (warn_irq)
> +		acpi_bus_unregister_driver(&iTCO_wdt_acpi_driver);
>   	pr_info("Watchdog Module Unloaded\n");  }
>
> --
> 1.7.9.5
>
>
> -----Original Message-----
> From: Darren Hart [mailto:dvhart@xxxxxxxxxxxxx]
> Sent: Monday, December 22, 2014 4:41 PM
> To: Muller, Francois-nicolas; Guenter Roeck
> Cc: 'platform-driver-x86@xxxxxxxxxxxxxxx'; Rafael Wysocki; Linux ACPI 
> Mailing List; linux-watchdog@xxxxxxxxxxxxxxx; Wim Van Sebroeck
> Subject: RE: [PATCH] TCO Watchdog warning interrupt driver creation
>
>
>
> On December 22, 2014 5:18:33 AM PST, "Muller, Francois-nicolas" <francois-nicolas.muller@xxxxxxxxx> wrote:
>> Thanks Darren and Guenter for your review.
>> Here is a new patchset, including the feature in the watchdog driver 
>> and using a boot parameter to be enabled.
>> François-Nicolas
>>
>> Subject: [PATCH] TCO watchdog warning interrupt handler
>>
>
> Please provide a complete commit message.
>
>> Signed-off-by: Francois-Nicolas Muller 
>> <francois-nicolas.muller@xxxxxxxxx>
>
> --
> Darren Hart
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

��.n��������+%������w��{.n�����{�����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux