On Fri, 2019-01-04 at 14:45 -0700, Stephen Warren wrote: > From: Stephen Warren <swarren@xxxxxxxxxx> > > The current code does this when handling MSI IRQs: > > a) Process the irq. > b) Clear the latched IRQ status. > > If a new IRQ occurs any time after (a) has read the IRQ status for the > last time and before (b), it will be lost. For example, this occurs in There was a patch series to fix this bug, sent in October, stretching to December. See "PCI: dwc: Fix interrupt race in when handling MSI" And "PCI: designware: Move interrupt acking into the proper callback" The result was more code changes, which in the end produce the same order of operations as is done here to fix the race. But it uses the irq framework correctly. I do not think this problem was introduced until 4.14, so the nvidia 4.9 must have additional patches if this is observed there. I'm sure it was not present in 4.12, as we used that, and did not see the issue until updating to 4.16. The current fix (for 4.21?) depends on significant restructuring that was done to this driver around 4.17, see "PCI: dwc: Move MSI IRQs allocation to IRQ domains hierarchical API". I think my original patch ("Fix interrupt race ..") or your patch here, which is basically the same, should be ported to 4.14 stable. But there are some other opinions on that. It's not clear to me if the hierarchical domain stuff would be back ported to stable series. It does not seem to me to be a good idea to do so.