On Tue, 2018-11-06 at 14:53 +0000, Lorenzo Pieralisi wrote: > > > Not sure why (5) is not used in your lists, I assume because you want > to highlight the race condition with the jump from 4 to 6 (or maybe > you do not like number 5 :), just curious). It's because I removed a step and I'm used to rst docs where it renumbers automatically. > > > The observant will notice that point 4 present the opportunity for the > > SoC's interrupt controller to lose the interrupt in the same manner as > > the bug in this driver. The driver for that interrupt controller will > > be written to properly deal with this. In some cases the hardware > > supports an EOI concept, where the 2nd IRQ is masked and internally > > queued in the hardware, to be re-raised at EOI in step 7. In other > > cases the IRQ will be unmasked and re-raised at step 4, but the kernel > > will see the handler is INPROGRESS and not re-invoke it, and instead set > > a PENDING flag, which causes the handler to re-run at step 7. > > > > [1] > > > > Fixes: 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status after it is handled, not before") > > I have two questions: > > - Commit 8c934095fa2f has been in the kernel for a year and no > regression was reported. It was assumed to fix a problem so before > reverting it I want to make sure we are not breaking anything else. There have been reports. Vignesh reported this some time ago. Also see https://lkml.org/lkml/2018/11/2/55, the author indicated to me that it was the caused by this problem after I pointed him to my patch. And I've reported it now too. We only updated to a kernel with the regression a few months ago. I think imx6/7 systems don't update their kernels that quickly. Usually not running a mainstream distro with automatic updates. And there are probably a lot of people using the NXP vendor kernel, which has a ton of unmerged patches, and so they don't see this regression yet. > - Your reasoning seems correct but I would pick Marc's brain on this > because I want to understand if what this patch does is what IRQ core > expects it to do, especially in relation to the IRQ chaining you > are mentioning. I've traced this through the ARM arch code and used our custom PCI device to generate interrupts in all the race windows. It does work as I say, though I can't say this it how it's supposed to work.