> On Wed, Nov 25, 2015 at 05:51:59AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote: > > > > `Infinite loop in NMI context' can happen: > > > > > > > > a. when a cpu panics on NMI while another cpu is processing panic > > > > b. when a cpu received an external or unknown NMI while another > > > > cpu is processing panic on NMI > > > > > > > > In the case of a, it loops in panic_smp_self_stop(). In the case > > > > of b, it loops in raw_spin_lock() of nmi_reason_lock. > > > > > > Please describe those two cases more verbosely - it takes slow people > > > like me a while to figure out what exactly can happen. > > > > a. when a cpu panics on NMI while another cpu is processing panic > > Ex. > > CPU 0 CPU 1 > > ================= ================= > > panic() > > panic_cpu <-- 0 > > check panic_cpu > > crash_kexec() > > receive an unknown NMI > > unknown_nmi_error() > > nmi_panic() > > panic() > > check panic_cpu > > panic_smp_self_stop() > > infinite loop in NMI context > > > > b. when a cpu received an external or unknown NMI while another > > cpu is processing panic on NMI > > Ex. > > CPU 0 CPU 1 > > ====================== ================== > > receive an unknown NMI > > unknown_nmi_error() > > nmi_panic() receive an unknown NMI > > panic_cpu <-- 0 unknown_nmi_error() > > panic() nmi_panic() > > check panic_cpu panic > > crash_kexec() check panic_cpu > > panic_smp_self_stop() > > infinite loop in NMI context > > Ok, that's what I saw too, thanks for confirming. > > But please write those examples with the *old* code in the commit > message, i.e. without panic_cpu and nmi_panic() which you're adding. Does *old* code mean the code without this patch *series* ? panic_cpu and nmi_panic() are introduced by PATCH 1/4, not this patch. > Basically, you want to structure your commit message this way: > > This is the problem the current code has: ... > > But we need to do this: ... > > We fix it by doing that: ... Good suggestion! I'll revise a bit with following your comment. > > > > + * directly. This function is used when we have already been in NMI handler. > > > > + */ > > > > +void poll_crash_ipi_and_callback(struct pt_regs *regs) > > > > > > Why "poll"? We won't return from crash_nmi_callback() if we're not the > > > crashing CPU. > > > > This function polls that crash IPI has been issued by checking > > crash_ipi_done, then invokes the callback. This is different > > from so-called "poll" functions. Do you have some good name? > > Maybe something as simple as "run_crash_callback"? I prefer this, but we might want to add some more prefix or suffix. For example, "conditionally_run_crash_nmi_callback". > Or since we're calling it from other places, maybe add the "crash" > prefix: > > while (!raw_spin_trylock(&nmi_reason_lock)) > crash_run_callback(regs); > > Looks even better to me in code context. :) Thanks for your deep review! -- Hidehiro Kawai Hitachi, Ltd. Research & Development Group ��.n��������+%������w��{.n�����{����*jg��������ݢj����G�������j:+v���w�m������w�������h�����٥