On Mon, 27 Feb 2017, Thomas Gleixner wrote: > On Sat, 25 Feb 2017, Linus Torvalds wrote: > > There are several things that set IRQS_PENDING, ranging from "try to > > test mis-routed interrupts while irqd was working", to "prepare for > > suspend losing the irq for us", to "irq auto-probing uses it on > > unassigned probable irqs". > > > > The *actual* reason to re-send, namely getting a nested irq that we > > had to drop because we got a second one while still handling the first > > (or because it was disabled), is just one case. > > > > Personally, I'd suspect some left-over state from auto-probing earlier > > in the boot, but I don't know. Could we fix that underlying issue? > > I'm on it. Adding a few trace points to the irq code gives me a pretty consistent picture across a bunch of different machines. The pending interrupt issue happens, at least on my test boxen, mostly on the 'legacy' interrupts (0 - 15). But even the IOAPIC interrupts >=16 happen occasionally. - Spurious interrupts on IRQ7, which are triggered by IRQ 0 (PIT/HPET). On one of the affected machines this stops when the interrupt system is switched to interrupt remapping !?!?!? - Spurious interrupts on various interrupt lines, which are triggered by IOAPIC interrupts >= IRQ16. That's a known issue on quite some chipsets that the legacy PCI interrupt (which is used when IOAPIC is disabled) is triggered when the IOAPIC >=16 interrupt fires. - Spurious interrupt caused by driver probing itself. I.e. the driver probing code causes an interrupt issued from the device inadvertently. That happens even on IRQ >= 16. This problem might be handled by the device driver code itself, but that's going to be ugly. See below. None of that was caused by misrouted irq or autoprobing. Autoprobing is not used on any modern maschine at all and the misrouted mechanism cannot set the pending flag on a interrupt which has no action installed. Sure, we might not set the pending flag on spurious interrupts, when an interrupt has no action assigned. We had it that way quite some years ago. But that caused issues on a couple of (non x86) systems because the peripheral which sent the spurious interrupt was waiting for acknowledgement forever and therefor not sending new interrupts. There was no way to handle that other than resending the interrupt after the action was installed. The reason for this is the following race: Device driver init() ack_bogus_pending_irq_in_device(); <---- Device sends spurious interrupt request_irq(); We cannot call ack_bogus_pending_irq_in_device() after the handler has been installed because it might clear a real interrupt before the handler is invoked. In fact we can, but that requires pretty ugly code. See below. Yes, it's broken hardware, but in that area there is more broken than sane hardware. The other point here is, that the "IOAPIC triggers spurious interrupt on legacy lines" issue can actually cause the same problem as the immediate resend/retrigger: request_irq(); <---- Other device sends IOAPIC interrupt which triggers the legacy line. If the handler is not prepared at that point, then it explodes as well. Same for this odd IRQ0 -> IRQ7 trigger, which I can observe on two machines. The whole resend/retrigger business is only relevant for edge type interrupts to deal with the following problem: disable_irq(); <---- Device sends interrupt enable_irq(); ===> Previously sent interrupt has not been acked in the device so no further interrupts happen. We lost an edge and are toast. We never do the resend/retrigger for level type interrupts, because they are resent by hardware if the interrupt is still pending, which is the case when you don't acknowlegde it at the device level. So now you might argue that a driver could handle this by doing: disable_irq(); <---- Device sends interrupt enable_irq(); lock_device(); do_some_magic_checks_and_handle_pending_irq(); unlock_device(); Requiring this would add tons of even more broken code to the device drivers and I really doubt that we want to go there. Same for the request_irq() case. I rather prefer that drivers are able to deal with spurious interrupts even on non shared interrupt lines and let the resend/retrigger mechanism expose them early when they do not. We can try to sample more data from the machines of affected users, but I doubt that it will give us more information than confirming that we really have to deal with all that hardware wreckage out there in some way or the other. Thanks, tglx