On 03/12/2012 02:49 PM, H. Peter Anvin wrote: > On 03/11/2012 10:43 PM, Fernando Luis V?zquez Cao wrote: >> To tackle this issue we can either stop the hardlockup detector >> or disable the LAPIC (the NMIs needed by x86's hardlockup detector >> are generated using performance counters in the LAPIC), leaving >> the I/O APICs untouched. The second is simpler and I think it >> is the approach Don took to fix this issue in RHEL kernels. >> >> Unfortunately, this is not enough, we are still exposed to external >> NMIs not routed through the LAPIC. In other words, we have to make >> sure that we always have and IDT that is able to handle NMIs without >> seemingly random reboots and lockups. To achieve this goal we need >> to fix machine_kexec() and the early IDT handlers. The current patch >> set takes care of the latter. > The only source of NMIs other than the LAPIC should be the system error > which can be disabled through the RTC port, so I think your second > paragraph here is way more mechanism than you need for very little gain. The thing is that we want to avoid playing with hardware in the kdump reboot patch when we can avoid it, the premise being that it cannot be accessed without risking a lockup or worse (as the deadlock accessing the I/O APIC showed). The kernel is crashing after all. What is more, I forgot to mention that the long term goal is to leave the LAPIC untouched too (we really want to keep the number of things we do in the context of the crashing kernel to the bare minimum), so we would still need to fix the early IDT. My patch set just installs a special handler for the NMI case so I think it is pretty simple and self contained. Another reason to apply these patches is to be consistent with the rest of the kernel. Spurious NMIs that would have been ignored after installing the final IDT would cause the system to halt if they happen to arrive while the early IDT is in place. - Fernando