On Tue, Apr 28, 2015 at 02:44:28PM -0400, Don Zickus wrote: > RAS doesn't go through the legacy ports (ie get_nmi_reason()). Instead it > triggers the external NMI through a different bit (ioapic I think). Well, I see it getting registered with __register_nmi_handler() which adds it to the NMI_LOCAL type, i.e., ghes_notify_nmi() gets called by default_do_nmi |-> nmi_handle(NMI_LOCAL, regs, b2b); AFAICT. Which explains also the issue we were seeing as that handler is called on each NMI, even when the machine is running a perf workload. > The nmi code has no idea what io_remap'ed address apei is using to map its > error handling register that GHES uses. Unlike the legacy port which is > always port 0x61. > > So, with NMI being basically a shared interrupt, with no ability to discern > who sent the interrupt (and even worse no ability to know how _many_ were sent as > the NMI is edge triggered instead of level triggered). As a result we rely > on the NMI handlers to talk to their address space/registers to determine if > they were they source of the interrupt. I was afraid it would be something like that. We probably should poke hw people to extend that NMI fun so that we can know who caused it. <snip stuff I agree with> > Anyway, any ideas or thoughts for improvement are always welcomed. :-) Yeah, I'm afraid without hw support, that won't be doable. We need the hw to tell us who caused the NMI. Otherwise we'll be round-robining (:-)) through handlers like nuts. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html