+ kernel-watchdogc-touch_nmi_watchdog-should-only-touch-local-cpu-not-every-one.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Subject: + kernel-watchdogc-touch_nmi_watchdog-should-only-touch-local-cpu-not-every-one.patch added to -mm tree
To: benzh@xxxxxxxxxxxx,dzickus@xxxxxxxxxx
From: akpm@xxxxxxxxxxxxxxxxxxxx
Date: Mon, 13 Jan 2014 12:03:13 -0800


The patch titled
     Subject: kernel/watchdog.c: touch_nmi_watchdog should only touch local cpu not every one
has been added to the -mm tree.  Its filename is
     kernel-watchdogc-touch_nmi_watchdog-should-only-touch-local-cpu-not-every-one.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/kernel-watchdogc-touch_nmi_watchdog-should-only-touch-local-cpu-not-every-one.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/kernel-watchdogc-touch_nmi_watchdog-should-only-touch-local-cpu-not-every-one.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Ben Zhang <benzh@xxxxxxxxxxxx>
Subject: kernel/watchdog.c: touch_nmi_watchdog should only touch local cpu not every one

I ran into a scenario where while one cpu was stuck and should have
panic'd because of the NMI watchdog, it didn't.  The reason was another
cpu was spewing stack dumps on to the console.  Upon investigation, I
noticed that when writing to the console and also when dumping the stack,
the watchdog is touched.

This causes all the cpus to reset their NMI watchdog flags and the 'stuck'
cpu just spins forever.

This change causes the semantics of touch_nmi_watchdog to be changed
slightly.  Previously, I accidentally changed the semantics and we noticed
there was a codepath in which touch_nmi_watchdog could be touched from a
preemtible area.  That caused a BUG() to happen when CONFIG_DEBUG_PREEMPT
was enabled.  I believe it was the acpi code.

My attempt here re-introduces the change to have the touch_nmi_watchdog()
code only touch the local cpu instead of all of the cpus.  But instead of
using __get_cpu_var(), I use the __raw_get_cpu_var() version.

This avoids the preemption problem.  However my reasoning wasn't because I
was trying to be lazy.  Instead I rationalized it as, well if preemption
is enabled then interrupts should be enabled to and the NMI watchdog will
have no reason to trigger.  So it won't matter if the wrong cpu is touched
because the percpu interrupt counters the NMI watchdog uses should still
be incrementing.

Don said:

: I'm ok with this patch, though it does alter the behaviour of how
: touch_nmi_watchdog works.  For the most part I don't think most callers
: need to touch all of the watchdogs (on each cpu).  Perhaps a corner case
: will pop up (the scheduler??  to mimic touch_all_softlockup_watchdogs() ).
: 
: But this does address an issue where if a system is locked up and one cpu
: is spewing out useful debug messages (or error messages), the hard lockup
: will fail to go off.  We have seen this on RHEL also.

Signed-off-by: Don Zickus <dzickus@xxxxxxxxxx>
Signed-off-by: Ben Zhang <benzh@xxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 kernel/watchdog.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff -puN kernel/watchdog.c~kernel-watchdogc-touch_nmi_watchdog-should-only-touch-local-cpu-not-every-one kernel/watchdog.c
--- a/kernel/watchdog.c~kernel-watchdogc-touch_nmi_watchdog-should-only-touch-local-cpu-not-every-one
+++ a/kernel/watchdog.c
@@ -158,14 +158,14 @@ void touch_all_softlockup_watchdogs(void
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 void touch_nmi_watchdog(void)
 {
-	if (watchdog_user_enabled) {
-		unsigned cpu;
-
-		for_each_present_cpu(cpu) {
-			if (per_cpu(watchdog_nmi_touch, cpu) != true)
-				per_cpu(watchdog_nmi_touch, cpu) = true;
-		}
-	}
+	/*
+	 * Using __raw here because some code paths have
+	 * preemption enabled.  If preemption is enabled
+	 * then interrupts should be enabled too, in which
+	 * case we shouldn't have to worry about the watchdog
+	 * going off.
+	 */
+	__raw_get_cpu_var(watchdog_nmi_touch) = true;
 	touch_softlockup_watchdog();
 }
 EXPORT_SYMBOL(touch_nmi_watchdog);
_

Patches currently in -mm which might be from benzh@xxxxxxxxxxxx are

kernel-watchdogc-touch_nmi_watchdog-should-only-touch-local-cpu-not-every-one.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Newbies FAQ]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux