> From: Michal Hocko [mailto:mhocko at kernel.org] > > On Thu 30-07-15 11:55:52, ???? / KAWAI?HIDEHIRO wrote: > > > From: Michal Hocko [mailto:mhocko at kernel.org] > [...] > > > Could you point me to the code which does that, please? Maybe we are > > > missing that in our 3.0 kernel. I was quite surprised to see this > > > behavior as well. > > > > Please see the snippet below. > > > > void setup_local_APIC(void) > > { > > ... > > /* > > * only the BP should see the LINT1 NMI signal, obviously. > > */ > > if (!cpu) > > value = APIC_DM_NMI; > > else > > value = APIC_DM_NMI | APIC_LVT_MASKED; > > if (!lapic_is_integrated()) /* 82489DX */ > > value |= APIC_LVT_LEVEL_TRIGGER; > > apic_write(APIC_LVT1, value); > > > > > > LINT1 pins of cpus other than CPU 0 are masked here. > > However, at least on some of Hitachi servers, NMI caused by NMI > > button doesn't seem to be delivered through LINT1. So, my `external NMI' > > word may not be correct. > > I am not familiar with details here but I can tell you that this > particular code snippet is the same in our 3.0 based kernel so it seems > that the HW is indeed doing something differently. Yes, and it turned out my PATCH 3/3 doesn't work at all on some hardware... > > > You might still get a panic on hardlockup which will happen on all CPUs > > > from the NMI context so we have to be able to handle panic in NMI on > > > many CPUs. > > > > Do you say about the case of a kerne panic while other cpus locks up > > in NMI context? In that case, there is no way to do things needed by > > kdump procedure including saving registeres... > > I am saying that watchdog_overflow_callback might trigger on more CPUs > and panic from NMI context as well. So this is not reduced to the NMI > button sends NMI to more CPUs. I understand. So, I have to also modify watchdog_overflow_callback to call nmi_panic(). > Why cannot the panic() context save all the registers if we are going to > loop in NMI context? This would be imho preferable to returning from > panic IMO. I'm not saying we cannot save registers and do some cleanups in NMI context. I fell that it would introduce unneeded complexity. Since watchdog_overflow_callback is defined as generic code, we need to implement the preparation for kdump for other architectures. I haven't checked which architectures support both nmi watchdog and kdump, though. Anyway, I came up with a simple solution for x86. Waiting for the timing of nmi_shootdown_cpus() in nmi_panic(), then invoke the callback registered by nmi_shootdown_cpus(). > > > I can provide the full log but it is quite mangled. I guess the > > > CPU130 was the only one allowed to proceed with the panic while others > > > returned from the unknown NMI handling. It took a lot of time until > > > CPU130 managed to boot the crash kernel with soft lockups and RCU stalls > > > reports. CPU0 is most probably locked up waiting for CPU130 to > > > acknowledge the IPI which will not happen apparently. > > > > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know > > why CPU 130 waits so long. I'll try to consider for a while. > > Yes, I do not understand the timing here either and the fact that the > log is a complete mess in the important parts doesn't help a wee bit. I'm interested in where "kernel panic -not syncing: " is. It may give us a clue. Regards, Kawai