Hi, On Wed 22-07-15 11:14:21, Hidehiro Kawai wrote: > When an HA cluster software or administrator detects non-response > of a host, they issue an NMI to the host to completely stop current > works and take a crash dump. If the kernel has already panicked > or is capturing a crash dump at that time, further NMI can cause > a crash dump failure. > > To solve this issue, this patch set does two things: > > - Don't panic on NMI if the kernel has already panicked > - Introduce "noextnmi" boot option which masks external NMI at the > boot time (supported only for x86) I am currently debugging the same issue for our customer. Curiously enough the issue happens on a Hitachi HW. I haven't posted my patch for an upstream review yet because I still do not have a feedback but I believe your solution is unnecessarily too complex. Unless I am missing something the following should be enough, no? --- >From ba6ef85d26113e720a630ea22b08efef5b70210f Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@xxxxxxx> Date: Fri, 17 Jul 2015 15:17:08 +0200 Subject: [PATCH] kexec: Never return from crash_kexec when kexex is in progress We had a report when kdump kernel hasn't booted after unknown NMI has been delivered and unknown_nmi_panic is enabled. The NMI is triggered by HW and it is delivered to all CPUs at the same time. The machine has hundreds of CPUs and the most plausible theory is that one CPU really manages to kick the kexec but it cannot shut down all the CPUs because they are processing NMI and so cannot process an IPI. Another CPU then manages to call smp_send_stop from a concurrent panic and this stops the kexec CPU which has managed to switch to the new kernel and doesn't run in the NMI mode anymore. Fix this by making crash_kexec to never return if there is a kexec in progress. This can be done easily by relying on the fact that kexec_mutex will never be released for an ongoing kexec so we just have to loop over the try lock. The only tricky part is that kexec_crash_image might be not loaded when we have to return. The check has to be done under the lock. Extract the trylock and check into try_crash_kexec and make it return true only if crash kexec is disabled. Signed-off-by: Michal Hocko <mhocko@xxxxxxx> --- kernel/kexec.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/kernel/kexec.c b/kernel/kexec.c index a785c1015e25..d61b1478167d 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1470,7 +1470,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, #endif /* CONFIG_KEXEC_FILE */ -void crash_kexec(struct pt_regs *regs) +static bool try_crash_kexec(struct pt_regs *regs) { /* Take the kexec_mutex here to prevent sys_kexec_load * running on one cpu from replacing the crash kernel @@ -1490,7 +1490,20 @@ void crash_kexec(struct pt_regs *regs) machine_kexec(kexec_crash_image); } mutex_unlock(&kexec_mutex); + return true; } + return false; +} + +void crash_kexec(struct pt_regs *regs) +{ + /* + * Never return from this function if a kexec is in progress + * already because next steps might interfere with it. + * try_crash_kexec will never succeed in such a case. + */ + while (!try_crash_kexec(regs)) + cpu_relax(); } size_t crash_get_memory_size(void) -- 2.1.4 -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html