Here is the revised commit description reflecting Dave's comment. Cc list was copied from -mm version. From: Hidehiro Kawai <hidehiro.kawai.ez@xxxxxxxxxxx> Subject: x86/panic: replace smp_send_stop() with kdump friendly version in panic path This patch fixes a problem reported by Daniel Walker (https://lkml.org/lkml/2015/6/24/44). When kernel panics with crash_kexec_post_notifiers kernel parameter enabled, other CPUs are stopped by smp_send_stop() instead of machine_crash_shutdown() in __crash_kexec() path. panic() if crash_kexec_post_notifiers == 1 smp_send_stop() atomic_notifier_call_chain() kmsg_dump() __crash_kexec() machine_crash_shutdown() Different from smp_send_stop(), machine_crash_shutdown() stops other CPUs with extra works for kdump. So, if smp_send_stop() stops other CPUs in advance, these extra works won't be done. For x86, kdump routines miss to save other CPUs' registers and disable virtualization extensions. To fix this problem, call a new kdump friendly function, crash_smp_send_stop(), instead of the smp_send_stop() when crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak function, and it just call smp_send_stop(). Architecture codes should override it so that kdump can work appropriately. This patch only provides x86-specific version. For Xen's PV kernel, just keep the current behavior. As for Dom0, it doesn't use crash_kexec routines, and it relies on panic notifier chain. At the end of the chain, a hypercall is issued which requests the hypervisor to execute kdump. This means regardless of crash_kexec_post_notifiers setting, smp_send_stop(). For PV HVM, it would work similarly to baremetal kernels with extra cleanups for hypervisor. It doesn't need additional care. Changes in V4: - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set - Rename panic_smp_send_stop to crash_smp_send_stop - Don't change the behavior for Xen's PV kernel Changes in V3: - Revise comments, description, and symbol names Changes in V2: - Replace smp_send_stop() call with crash_kexec version which saves cpu states and cleans up VMX/SVM - Drop a fix for Problem 1 at this moment Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@xxxxxxxxxxx> Reported-by: Daniel Walker <dwalker@xxxxxxxxxx> Cc: Dave Young <dyoung@xxxxxxxxxx> Cc: Baoquan He <bhe@xxxxxxxxxx> Cc: Vivek Goyal <vgoyal@xxxxxxxxxx> Cc: Eric Biederman <ebiederm@xxxxxxxxxxxx> Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx> Cc: Daniel Walker <dwalker@xxxxxxxxxx> Cc: Xunlei Pang <xpang@xxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> Cc: Borislav Petkov <bp@xxxxxxx> Cc: David Vrabel <david.vrabel@xxxxxxxxxx> Cc: Toshi Kani <toshi.kani@xxxxxxx> Cc: Ralf Baechle <ralf@xxxxxxxxxxxxxx> Cc: David Daney <david.daney@xxxxxxxxxx> Cc: Aaro Koskinen <aaro.koskinen@xxxxxx> Cc: "Steven J. Hill" <steven.hill@xxxxxxxxxx> Cc: Corey Minyard <cminyard@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > -----Original Message----- > From: Hidehiro Kawai > Hi Xunlei, > > > From: Xunlei Pang [mailto:xpang@xxxxxxxxxx] > > Sent: Tuesday, September 20, 2016 4:40 PM > > On 2016/08/15/ at 19:22, Hidehiro Kawai wrote: > > > Hi Dave, > > > > > > Thank you for the review. > > > > > >> From: Dave Young [mailto:dyoung@xxxxxxxxxx] > > >> Sent: Friday, August 12, 2016 12:17 PM > > >> > > >> Thanks for the update. > > >> On 08/10/16 at 05:09pm, Hidehiro Kawai wrote: > > >>> Daniel Walker reported problems which happens when > > >>> crash_kexec_post_notifiers kernel option is enabled > > >>> (https://lkml.org/lkml/2015/6/24/44). > > >>> > > >>> In that case, smp_send_stop() is called before entering kdump routines > > >>> which assume other CPUs are still online. As the result, for x86, > > >>> kdump routines fail to save other CPUs' registers and disable > > >>> virtualization extensions. > > >> Seems you simplified the changelog, but I think a little more details > > >> will be helpful to understand the patch. You know sometimes lkml.org > > >> does not work well. > > > So, I'll try another archives when I post patch set next time. > > > > Hi Hidehiro Kawai, > > > > What's the status of this patch set, are you going to send an updated version? > > Sorry, I misunderstood what Dave said, and I thought I don't > need to revise the patch set. > > Currently, this patch set is in -mm, then -next tree. > What I need to fix is only commit descriptions, so I'm going to > post revised descriptions as replies to this thread, and then > request to Andrew to replace them if there is no objection. > (or should I resend the whole patch set?) > > Regards, > Hidehiro Kawai > > > >>> To fix this problem, call a new kdump friendly function, > > >>> crash_smp_send_stop(), instead of the smp_send_stop() when > > >>> crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a > > >>> weak function, and it just call smp_send_stop(). Architecture > > >>> codes should override it so that kdump can work appropriately. > > >>> This patch only provides x86-specific version. > > >>> > > >>> For Xen's PV kernel, just keep the current behavior. > > >> Could you explain a bit about above Xen PV kernel behavior? > > >> > > >> BTW, this version looks better, I think I'm fine with this version > > >> besides of the questions about changelog. > > > As for Dom0 kernel, it doesn't use crash_kexec routines, and > > > it relies on panic notifier chain. At the end of the chain, > > > xen_panic_event is called, and it issues a hypercall which > > > requests Hypervisor to execute kdump. This means whether > > > crash_kexec_panic_notifiers is set or not, panic notifiers > > > are called after smp_send_stop. Even if we save registers > > > in Dom0 kernel, they seem to be ignored (Hypervisor is responsible > > > for that). This is why I kept the current behavior for Xen. > > > > > > For PV DomU kernel, kdump is not supported. For PV HVM > > > DomU, I'm not sure what will happen on panic because I > > > couldn't boot PV HVM DomU and test it. But I think it will > > > work similarly to baremetal kernels with extra cleanups > > > for Hypervisor. > > > > > > Best regards, > > > > > > Hidehiro Kawai > > > > > >>> Changes in V4: > > >>> - Keep to use smp_send_stop if crash_kexec_post_notifiers is not set > > >>> - Rename panic_smp_send_stop to crash_smp_send_stop > > >>> - Don't change the behavior for Xen's PV kernel > > >>> > > >>> Changes in V3: > > >>> - Revise comments, description, and symbol names > > >>> > > >>> Changes in V2: > > >>> - Replace smp_send_stop() call with crash_kexec version which > > >>> saves cpu states and cleans up VMX/SVM > > >>> - Drop a fix for Problem 1 at this moment > > >>> > > >>> Reported-by: Daniel Walker <dwalker@xxxxxxxxxx> > > >>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) > > >>> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@xxxxxxxxxxx> > > >>> Cc: Dave Young <dyoung@xxxxxxxxxx> > > >>> Cc: Baoquan He <bhe@xxxxxxxxxx> > > >>> Cc: Vivek Goyal <vgoyal@xxxxxxxxxx> > > >>> Cc: Eric Biederman <ebiederm@xxxxxxxxxxxx> > > >>> Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx> > > >>> Cc: Daniel Walker <dwalker@xxxxxxxxxx> > > >>> Cc: Xunlei Pang <xpang@xxxxxxxxxx> > > >>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > >>> Cc: Ingo Molnar <mingo@xxxxxxxxxx> > > >>> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> > > >>> Cc: Borislav Petkov <bp@xxxxxxx> > > >>> Cc: David Vrabel <david.vrabel@xxxxxxxxxx> > > >>> Cc: Toshi Kani <toshi.kani@xxxxxxx> > > >>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > >>> --- > > >>> arch/x86/include/asm/kexec.h | 1 + > > >>> arch/x86/include/asm/smp.h | 1 + > > >>> arch/x86/kernel/crash.c | 22 +++++++++++++++++--- > > >>> arch/x86/kernel/smp.c | 5 ++++ > > >>> kernel/panic.c | 47 ++++++++++++++++++++++++++++++++++++------ > > >>> 5 files changed, 66 insertions(+), 10 deletions(-) > > >>> > > >>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h > > >>> index d2434c1..282630e 100644 > > >>> --- a/arch/x86/include/asm/kexec.h > > >>> +++ b/arch/x86/include/asm/kexec.h > > >>> @@ -210,6 +210,7 @@ struct kexec_entry64_regs { > > >>> > > >>> typedef void crash_vmclear_fn(void); > > >>> extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss; > > >>> +extern void kdump_nmi_shootdown_cpus(void); > > >>> > > >>> #endif /* __ASSEMBLY__ */ > > >>> > > >>> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h > > >>> index ebd0c16..f70989c 100644 > > >>> --- a/arch/x86/include/asm/smp.h > > >>> +++ b/arch/x86/include/asm/smp.h > > >>> @@ -50,6 +50,7 @@ struct smp_ops { > > >>> void (*smp_cpus_done)(unsigned max_cpus); > > >>> > > >>> void (*stop_other_cpus)(int wait); > > >>> + void (*crash_stop_other_cpus)(void); > > >>> void (*smp_send_reschedule)(int cpu); > > >>> > > >>> int (*cpu_up)(unsigned cpu, struct task_struct *tidle); > > >>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c > > >>> index 9616cf7..650830e 100644 > > >>> --- a/arch/x86/kernel/crash.c > > >>> +++ b/arch/x86/kernel/crash.c > > >>> @@ -133,15 +133,31 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs) > > >>> disable_local_APIC(); > > >>> } > > >>> > > >>> -static void kdump_nmi_shootdown_cpus(void) > > >>> +void kdump_nmi_shootdown_cpus(void) > > >>> { > > >>> nmi_shootdown_cpus(kdump_nmi_callback); > > >>> > > >>> disable_local_APIC(); > > >>> } > > >>> > > >>> +/* Override the weak function in kernel/panic.c */ > > >>> +void crash_smp_send_stop(void) > > >>> +{ > > >>> + static int cpus_stopped; > > >>> + > > >>> + if (cpus_stopped) > > >>> + return; > > >>> + > > >>> + if (smp_ops.crash_stop_other_cpus) > > >>> + smp_ops.crash_stop_other_cpus(); > > >>> + else > > >>> + smp_send_stop(); > > >>> + > > >>> + cpus_stopped = 1; > > >>> +} > > >>> + > > >>> #else > > >>> -static void kdump_nmi_shootdown_cpus(void) > > >>> +void crash_smp_send_stop(void) > > >>> { > > >>> /* There are no cpus to shootdown */ > > >>> } > > >>> @@ -160,7 +176,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) > > >>> /* The kernel is broken so disable interrupts */ > > >>> local_irq_disable(); > > >>> > > >>> - kdump_nmi_shootdown_cpus(); > > >>> + crash_smp_send_stop(); > > >>> > > >>> /* > > >>> * VMCLEAR VMCSs loaded on this cpu if needed. > > >>> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c > > >>> index 658777c..68f8cc2 100644 > > >>> --- a/arch/x86/kernel/smp.c > > >>> +++ b/arch/x86/kernel/smp.c > > >>> @@ -32,6 +32,8 @@ > > >>> #include <asm/nmi.h> > > >>> #include <asm/mce.h> > > >>> #include <asm/trace/irq_vectors.h> > > >>> +#include <asm/kexec.h> > > >>> + > > >>> /* > > >>> * Some notes on x86 processor bugs affecting SMP operation: > > >>> * > > >>> @@ -342,6 +344,9 @@ struct smp_ops smp_ops = { > > >>> .smp_cpus_done = native_smp_cpus_done, > > >>> > > >>> .stop_other_cpus = native_stop_other_cpus, > > >>> +#if defined(CONFIG_KEXEC_CORE) > > >>> + .crash_stop_other_cpus = kdump_nmi_shootdown_cpus, > > >>> +#endif > > >>> .smp_send_reschedule = native_smp_send_reschedule, > > >>> > > >>> .cpu_up = native_cpu_up, > > >>> diff --git a/kernel/panic.c b/kernel/panic.c > > >>> index ca8cea1..e6480e2 100644 > > >>> --- a/kernel/panic.c > > >>> +++ b/kernel/panic.c > > >>> @@ -71,6 +71,32 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs) > > >>> panic_smp_self_stop(); > > >>> } > > >>> > > >>> +/* > > >>> + * Stop other CPUs in panic. Architecture dependent code may override this > > >>> + * with more suitable version. For example, if the architecture supports > > >>> + * crash dump, it should save registers of each stopped CPU and disable > > >>> + * per-CPU features such as virtualization extensions. > > >>> + */ > > >>> +void __weak crash_smp_send_stop(void) > > >>> +{ > > >>> + static int cpus_stopped; > > >>> + > > >>> + /* > > >>> + * This function can be called twice in panic path, but obviously > > >>> + * we execute this only once. > > >>> + */ > > >>> + if (cpus_stopped) > > >>> + return; > > >>> + > > >>> + /* > > >>> + * Note smp_send_stop is the usual smp shutdown function, which > > >>> + * unfortunately means it may not be hardened to work in a panic > > >>> + * situation. > > >>> + */ > > >>> + smp_send_stop(); > > >>> + cpus_stopped = 1; > > >>> +} > > >>> + > > >>> atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID); > > >>> > > >>> /* > > >>> @@ -164,14 +190,21 @@ void panic(const char *fmt, ...) > > >>> if (!_crash_kexec_post_notifiers) { > > >>> printk_nmi_flush_on_panic(); > > >>> __crash_kexec(NULL); > > >>> - } > > >>> > > >>> - /* > > >>> - * Note smp_send_stop is the usual smp shutdown function, which > > >>> - * unfortunately means it may not be hardened to work in a panic > > >>> - * situation. > > >>> - */ > > >>> - smp_send_stop(); > > >>> + /* > > >>> + * Note smp_send_stop is the usual smp shutdown function, which > > >>> + * unfortunately means it may not be hardened to work in a > > >>> + * panic situation. > > >>> + */ > > >>> + smp_send_stop(); > > >>> + } else { > > >>> + /* > > >>> + * If we want to do crash dump after notifier calls and > > >>> + * kmsg_dump, we will need architecture dependent extra > > >>> + * works in addition to stopping other CPUs. > > >>> + */ > > >>> + crash_smp_send_stop(); > > >>> + } > > >>> > > >>> /* > > >>> * Run any panic handlers, including those that might need to > > >>> > > >>>