+ watchdog-fix-possible-soft-lockup-warning-at-bootup.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: kernel/watchdog.c: fix possible soft lockup warning at bootup
has been added to the -mm tree.  Its filename is
     watchdog-fix-possible-soft-lockup-warning-at-bootup.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/watchdog-fix-possible-soft-lockup-warning-at-bootup.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/watchdog-fix-possible-soft-lockup-warning-at-bootup.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Waiman Long <longman@xxxxxxxxxx>
Subject: kernel/watchdog.c: fix possible soft lockup warning at bootup

It was found that watchdog soft lockup warning was displayed on some
arm64 server systems at bootup time:

 [   25.496379] watchdog: BUG: soft lockup - CPU#14 stuck for 22s!  [swapper/14:0]
 [   25.496381] Modules linked in:
 [   25.496386] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G        W    L --------- -  - 4.18.0-rhel8.1+ #9
 [   25.496388] pstate: 60000009 (nZCv daif -PAN -UAO)
 [   25.496393] pc : arch_cpu_idle+0x34/0x140
 [   25.496395] lr : arch_cpu_idle+0x30/0x140
 [   25.496397] sp : ffff000021f4ff10
 [   25.496398] x29: ffff000021f4ff10 x28: 0000000000000000
 [   25.496401] x27: 0000000000000000 x26: ffff809f483c0000
 [   25.496404] x25: 0000000000000000 x24: ffff00001145c03c
 [   25.496407] x23: ffff00001110c9f8 x22: ffff000011453708
 [   25.496410] x21: ffff00001145bffc x20: 0000000000004000
 [   25.496413] x19: ffff0000110f0018 x18: 0000000000000010
 [   25.496416] x17: 0000000000000cc8 x16: 0000000000000000
 [   25.496419] x15: ffffffffffffffff x14: ffff000011453708
 [   25.496422] x13: ffff000091cc5caf x12: ffff000011cc5cb7
 [   25.496424] x11: 6572203030642072 x10: 0000000000000d10
 [   25.496427] x9 : ffff000021f4fe80 x8 : ffff809f483c0d70
 [   25.496430] x7 : 00000000b123f581 x6 : 00000000ffff8ae1
 [   25.496433] x5 : 00000000ffffffff x4 : 0000809f6ac90000
 [   25.496436] x3 : 4000000000000000 x2 : ffff809f7bd9e9c0
 [   25.496439] x1 : ffff0000110f0018 x0 : ffff000021f4ff10
 [   25.496441] Call trace:
 [   25.496444]  arch_cpu_idle+0x34/0x140
 [   25.496447]  do_idle+0x210/0x288
 [   25.496449]  cpu_startup_entry+0x2c/0x30
 [   25.496452]  secondary_start_kernel+0x124/0x138

Further analysis of the situation revealed that the smp_init() call
itself took more than 20s for that 2-socket 56-core and 224-thread
server.

 [    0.115632] CPU1: Booted secondary processor 0x0000000100 [0x431f0af1]
   :
 [   27.177282] CPU223: Booted secondary processor 0x0000011b03 [0x431f0af1]

By adding some instrumentation code, it was found that for cpu 14,
watchdog_enable() was called early with a timestamp of 1. The first
watchdog timer callback for that cpu, however, happened really late at
the above 25s timestamp mark causing the watchdog logic to treat the
delay as a soft lockup.

On another arm64 system that doesn't show the soft lockup warning, the
watchdog timer callback happened earlier at the 5s timestamp mark with
the watchdog thread invoked shortly after that.

The reason why there was such a delay in the first watchdog timer
callback for that particular system wasn't fully known yet. Given
the fact that smp_init() can run for a long time on some systems,
it is probably more appropriate to enable the watchdog function after
smp_init() instead of before it.

Another way is to leave watchdog_touch_ts at 0 in watchdog_enable()
while the system is at the booting stage. Either one of those should
be able to eliminate the soft lockup warning on bootup.

Link: http://lkml.kernel.org/r/20200102154149.7564-1-longman@xxxxxxxxxx
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Mike Rapoport <rppt@xxxxxxxxxxxxx>
Cc: Kees Cook <keescook@xxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 init/main.c       |    2 +-
 kernel/watchdog.c |    4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

--- a/init/main.c~watchdog-fix-possible-soft-lockup-warning-at-bootup
+++ a/init/main.c
@@ -1205,9 +1205,9 @@ static noinline void __init kernel_init_
 	init_mm_internals();
 
 	do_pre_smp_initcalls();
-	lockup_detector_init();
 
 	smp_init();
+	lockup_detector_init();
 	sched_init_smp();
 
 	page_alloc_init_late();
--- a/kernel/watchdog.c~watchdog-fix-possible-soft-lockup-warning-at-bootup
+++ a/kernel/watchdog.c
@@ -496,7 +496,9 @@ static void watchdog_enable(unsigned int
 		      HRTIMER_MODE_REL_PINNED_HARD);
 
 	/* Initialize timestamp */
-	__touch_watchdog();
+	if (system_state != SYSTEM_BOOTING)
+		__touch_watchdog();
+
 	/* Enable the perf event */
 	if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
 		watchdog_nmi_enable(cpu);
_

Patches currently in -mm which might be from longman@xxxxxxxxxx are

mm-hugetlb-defer-freeing-of-huge-pages-if-in-non-task-context.patch
watchdog-fix-possible-soft-lockup-warning-at-bootup.patch




[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux