On Sun, Mar 11, 2012 at 10:49:23PM -0700, H. Peter Anvin wrote: > On 03/11/2012 10:43 PM, Fernando Luis V?zquez Cao wrote: > > > > > To tackle this issue we can either stop the hardlockup detector > > or disable the LAPIC (the NMIs needed by x86's hardlockup detector > > are generated using performance counters in the LAPIC), leaving > > the I/O APICs untouched. The second is simpler and I think it > > is the approach Don took to fix this issue in RHEL kernels. > > > > Unfortunately, this is not enough, we are still exposed to external > > NMIs not routed through the LAPIC. In other words, we have to make > > sure that we always have and IDT that is able to handle NMIs without > > seemingly random reboots and lockups. To achieve this goal we need > > to fix machine_kexec() and the early IDT handlers. The current patch > > set takes care of the latter. > > > > The only source of NMIs other than the LAPIC should be the system error > which can be disabled through the RTC port, so I think your second > paragraph here is way more mechanism than you need for very little gain. I forgot about the RTC port. I can't seem to find the documentation for it, but I believe it was port 0x70? That would cover external NMIs I believe. Leaving the disable_lapic would cover internal NMIs. I don't know how far do we want to go with installing stub idt handlers and such. Honestly, I just wanted to i/o apic race condition fixed. http://lkml.indiana.edu/hypermail/linux/kernel/1202.3/02533.html Cheers, Don