RE: [V3 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

河合英宏 / KAWAI，HIDEHIRO <hidehiro.kawai.ez@xxxxxxxxxxx> · Sat, 22 Aug 2015 01:43:00 +0000

> From: Peter Zijlstra [mailto:peterz@xxxxxxxxxxxxx]
> 
> On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > When cpu-A panics on NMI just after cpu-B has panicked, cpu-A loops
> > infinitely in NMI context.  Especially for x86, cpu-B issues NMI IPI
> > to other cpus to save their register states and do some cleanups if
> > kdump is enabled, but cpu-A can't handle the NMI and fails to save
> > register states.
> >
> > To solve thie issue, we wait for the timing of the NMI IPI, then
> > call the NMI handler which saves register states.
> 
> Sorry, I don't follow, what?

First, a subroutine of crash_kexec(), nmi_shootdown_cpus()
send NMI IPI to non-panic cpus to stop them while saving their
registers ans doing some cleanups for crash dumping.  So if a non-panic
cpu is looping in NMI context infinitely at that time, we fail to save
its register information and lose the information from the crash dump.

`Infinite loop in NMI context' can happen when panic on NMI is about
to happen while another cpu has already been processing panic().
To save regs and do some cleanups in that case too, this patch does
two things:

1. Moves the timing of `infinite loop in NMI context' (actually
   panic_smp_self_stop()) outside of panic() to keep the pt_regs object
2. call a callback of nmi_shootdown_cpus() directly to save regs and
   do some cleanups after setting waiting_for_crash_ipi which is used
   for counting down the number of cpus which handled the callback

Does that answer your question?

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

��.n��������+%������w��{.n�����{����*jg��������ݢj����G�������j:+v���w�m������w�������h�����٥