Patch "powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic" has been added to the 5.15-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     powerpc-fadump-fix-inaccurate-cpu-state-info-in-vmco.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 40a6b0b1fbf310cad1133345bd0f5ebb5e5df6aa
Author: Hari Bathini <hbathini@xxxxxxxxxxxxx>
Date:   Tue Dec 7 16:07:19 2021 +0530

    powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic
    
    [ Upstream commit 06e629c25daa519be620a8c17359ae8fc7a2e903 ]
    
    In panic path, fadump is triggered via a panic notifier function.
    Before calling panic notifier functions, smp_send_stop() gets called,
    which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc
    ("powerpc: stop_this_cpu: remove the cpu from the online map.") and
    again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
    started marking CPUs as offline while stopping them. So, if a kernel
    has either of the above commits, vmcore captured with fadump via panic
    path would not process register data for all CPUs except the panic'ing
    CPU. Sample output of crash-utility with such vmcore:
    
      # crash vmlinux vmcore
      ...
            KERNEL: vmlinux
          DUMPFILE: vmcore  [PARTIAL DUMP]
              CPUS: 1
              DATE: Wed Nov 10 09:56:34 EST 2021
            UPTIME: 00:00:42
      LOAD AVERAGE: 2.27, 0.69, 0.24
             TASKS: 183
          NODENAME: XXXXXXXXX
           RELEASE: 5.15.0+
           VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
           MACHINE: ppc64le  (2500 Mhz)
            MEMORY: 8 GB
             PANIC: "Kernel panic - not syncing: sysrq triggered crash"
               PID: 3394
           COMMAND: "bash"
              TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
               CPU: 1
             STATE: TASK_RUNNING (PANIC)
    
      crash> p -x __cpu_online_mask
      __cpu_online_mask = $1 = {
        bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
      }
      crash>
      crash>
      crash> p -x __cpu_active_mask
      __cpu_active_mask = $2 = {
        bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
      }
      crash>
    
    While this has been the case since fadump was introduced, the issue
    was not identified for two probable reasons:
    
      - In general, the bulk of the vmcores analyzed were from crash
        due to exception.
    
      - The above did change since commit 8341f2f222d7 ("sysrq: Use
        panic() to force a crash") started using panic() instead of
        deferencing NULL pointer to force a kernel crash. But then
        commit de6e5d38417e ("powerpc: smp_send_stop do not offline
        stopped CPUs") stopped marking CPUs as offline till kernel
        commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()")
        reverted that change.
    
    To ensure post processing register data of all other CPUs happens
    as intended, let panic() function take the crash friendly path (read
    crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
    option. Also, as register data for all CPUs is captured by f/w, skip
    IPI callbacks here for fadump, to avoid any complications in finding
    the right backtraces.
    
    Signed-off-by: Hari Bathini <hbathini@xxxxxxxxxxxxx>
    Signed-off-by: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
    Link: https://lore.kernel.org/r/20211207103719.91117-2-hbathini@xxxxxxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index b7ceb041743c9..60f5fc14aa235 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1641,6 +1641,14 @@ int __init setup_fadump(void)
 	else if (fw_dump.reserve_dump_area_size)
 		fw_dump.ops->fadump_init_mem_struct(&fw_dump);
 
+	/*
+	 * In case of panic, fadump is triggered via ppc_panic_event()
+	 * panic notifier. Setting crash_kexec_post_notifiers to 'true'
+	 * lets panic() function take crash friendly path before panic
+	 * notifiers are invoked.
+	 */
+	crash_kexec_post_notifiers = true;
+
 	return 1;
 }
 subsys_initcall(setup_fadump);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index d03823aa7e4de..fb95f92dcfac6 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -61,6 +61,7 @@
 #include <asm/cpu_has_feature.h>
 #include <asm/ftrace.h>
 #include <asm/kup.h>
+#include <asm/fadump.h>
 
 #ifdef DEBUG
 #include <asm/udbg.h>
@@ -638,6 +639,15 @@ void crash_smp_send_stop(void)
 {
 	static bool stopped = false;
 
+	/*
+	 * In case of fadump, register data for all CPUs is captured by f/w
+	 * on ibm,os-term rtas call. Skip IPI callbacks to other CPUs before
+	 * this rtas call to avoid tricky post processing of those CPUs'
+	 * backtraces.
+	 */
+	if (should_fadump_crash())
+		return;
+
 	if (stopped)
 		return;
 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux