On 03/13/2012 05:16 AM, H. Peter Anvin wrote: > On 03/12/2012 01:04 PM, H. Peter Anvin wrote: >> On 03/12/2012 01:01 PM, Eric W. Biederman wrote: >>> The basic problem is which source do we block this at? How many >>> sources are their? And architecturally last I looked x86 no longer >>> has a NMI disable EFI and similar systems want to get away without >>> a CMOS legacy clock because designers so often get them wrong. >>> >> On all processors which have an LAPIC you can block all NMI sources at >> the LAPIC. I think it's safe to assume that if you don't have an LAPIC >> -- an ancient system by now -- you have port 70h. >> > One thing: *disabling* the LAPIC will allow external NMIs coming in on > LINT1 through, since the LAPIC in the disabled state tries to mimic the > no-LAPIC configuration. So I don't think you want to disable LAPIC as > much as disable the interrupt vectors within. Does this sound like a plan to get the ball rolling?: 1.- Merge Don's patch to disable the LAPIC in kdump reboot path (this fixes a real issue seen in the field, is a net win and certainly not a regression - indeed it makes the code simpler because the I/O APICs are left untouched). 2.- Merge my patch set to ignore early NMIs (this brings the behavior of the boot code in line with what we do in the rest of the kernel a we can avoid situations were a spurious NMI causes the kernel to halt). The early NMI handler is temporary and the final NMI handler installed shortly afterwards will take care of subsequent NMIs. 3.- Make sure that spurious NMIs (i.e. NMIs that for whatever reason could not be stopped at the source) received during the reboot path to the kdump kernel do not cause a triple fault or a system lockup. This is under testing. 4.- Identify all the NMI sources and keep them from reaching the CPU when it can be done in a race-free way. Can we get 1 and 2 merged while we work on further improvements (3 and 4)? Thanks, Fernando