> From: Peter Zijlstra [mailto:peterz@xxxxxxxxxxxxx] > > On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote: > > When cpu-A panics on NMI just after cpu-B has panicked, cpu-A loops > > infinitely in NMI context. Especially for x86, cpu-B issues NMI IPI > > to other cpus to save their register states and do some cleanups if > > kdump is enabled, but cpu-A can't handle the NMI and fails to save > > register states. > > > > To solve thie issue, we wait for the timing of the NMI IPI, then > > call the NMI handler which saves register states. > > Sorry, I don't follow, what? First, a subroutine of crash_kexec(), nmi_shootdown_cpus() send NMI IPI to non-panic cpus to stop them while saving their registers ans doing some cleanups for crash dumping. So if a non-panic cpu is looping in NMI context infinitely at that time, we fail to save its register information and lose the information from the crash dump. `Infinite loop in NMI context' can happen when panic on NMI is about to happen while another cpu has already been processing panic(). To save regs and do some cleanups in that case too, this patch does two things: 1. Moves the timing of `infinite loop in NMI context' (actually panic_smp_self_stop()) outside of panic() to keep the pt_regs object 2. call a callback of nmi_shootdown_cpus() directly to save regs and do some cleanups after setting waiting_for_crash_ipi which is used for counting down the number of cpus which handled the callback Does that answer your question? Regards, Hidehiro Kawai Hitachi, Ltd. Research & Development Group ��.n��������+%������w��{.n�����{����*jg��������ݢj����G�������j:+v���w�m������w�������h�����٥