Vivek Goyal <vgoyal at redhat.com> writes: > On Mon, Mar 12, 2012 at 03:14:20PM +0900, Fernando Luis V?zquez Cao wrote: > > [..] >> The thing is that we want to avoid playing with hardware in the kdump >> reboot patch when we can avoid it, the premise being that it cannot >> be accessed without risking a lockup or worse (as the deadlock accessing >> the I/O APIC showed). > > I think there needs to be a limit to being paranoid. On one hand people > want to run panic notifiers, all the kmsg_dump() hooks in panic path, and > on the other hand we are afraid of even disabling LAPIC. And the kmsg_dump code and the panic notifiers aren't being run. Having seen some of their failure modes being patched up recently (Adding and removing sysfs files!!!!) I'm very comfortable with the level of paranoia. It has been proven time and time again that the more you do in the failing kernel that the greater your likely-hood of not getting your failure information out. > I personally think that disabling LAPIC is reasonably practical solution > to the problem until and unless somebody shows that it deadlocks > easily. Disabling NMI generation in the LAPIC is fine, and for the short term I don't even have a problem with disabling the entire LAPIC as all of our platforms seem to have code for completely reprogramming it. At the same time there have been cases like the i8259 routed through the ExtInt pin of the lapci that we haven't been given programming information about and that if we want to work we should avoid touching. Furthermore we have two reported cases of people experiencing real NMIs on the kdump path. So we have to assume the presence of the CMOS nmi disable as well if we are going to unequivocally disable NMIs. Given the variety of x86 hardware today and the growing variety of x86 hardware tomorrow we are going to be fixing this until we can actually handle the NMIs. Hardware designers are unfortunately creative enough that we aren't going to think of everything. Given that it is has taken us almost a decade to realize that there actually is a real world problem I'm not too keen on a solution that is just good enough to fix a small problem. I would love it if x86 had an architectural NMI off switch but with Intel pushing EFI and the removal of the cmos clock x86 no longer has an always available NMI off switch. Furthermore handling of NMI is not hard it is just a little tricky, to test. Eric