Hi Stephen, On 04/01/2019 21:45, Stephen Warren wrote: > From: Stephen Warren <swarren@xxxxxxxxxx> > > The current code does this when handling MSI IRQs: > > a) Process the irq. > b) Clear the latched IRQ status. > > If a new IRQ occurs any time after (a) has read the IRQ status for the > last time and before (b), it will be lost. For example, this occurs in > practice when using a Marvell 9171 AHCI controller with NCQ enabled; > many command timeouts occur with certain disk access patterns. > > Fix the code to do the following instead, so that if any new IRQs are > raised during the processing of the IRQ, the IRQ status is not cleared, > so that the IRQ is not lost. > > a) Clear the latched IRQ status. > b) Process the IRQ. > > This change reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt > status after it is handled, not before") > > This change re-applies commit ca1658921b63 ("PCI: designware: Fix > missing MSI IRQs") > > Signed-off-by: Stephen Warren <swarren@xxxxxxxxxx> > Cc: Arnd Bergmann <arnd@xxxxxxxx> > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Cc: Faiz Abbas <faiz_abbas@xxxxxx> > Cc: Harro Haan <hrhaan@xxxxxxxxx> > Cc: Jingoo Han <jg1.han@xxxxxxxxxxx> > Cc: Joao Pinto <jpinto@xxxxxxxxxxxx> > Cc: Juergen Beisert <jbe@xxxxxxxxxxxxxx> > Cc: Marek Vasut <marex@xxxxxxx> > Cc: Matthias Mann <m.mann@xxxxxxxxxxxxxxxxxxxxxx> > Cc: Mohit Kumar <mohit.kumar@xxxxxx> > Cc: Pratyush Anand <pratyush.anand@xxxxxx> > Cc: Richard Zhu <hong-xing.zhu@xxxxxxxxxxxxx> > Cc: Sean Cross <xobs@xxxxxxxxxx> > Cc: Shawn Guo <shawn.guo@xxxxxxxxxx> > Cc: Siva Reddy Kallam <siva.kallam@xxxxxxxxxxx> > Cc: Srikanth T Shivanand <ts.srikanth@xxxxxxxxxxx> > Cc: Tim Harvey <tharvey@xxxxxxxxxxxxx> > --- > Note: This issue was found in downstream NVIDIA 4.9 and 4.14 kernels. > However, the exact same code structure is present in mainline and I have > no reason to believe the problem would not reproduce there. I have > compile tested but not runtime tested it in mainline, since my board is > not yet supported in mainline. > --- > drivers/pci/controller/dwc/pcie-designware-host.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c > index 692dd1b264fb..7fd6c56a6f35 100644 > --- a/drivers/pci/controller/dwc/pcie-designware-host.c > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c > @@ -98,10 +98,10 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp) > irq = irq_find_mapping(pp->irq_domain, > (i * MAX_MSI_IRQS_PER_CTRL) + > pos); > - generic_handle_irq(irq); > dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS + > (i * MSI_REG_CTRL_BLOCK_SIZE), > 4, 1 << pos); > + generic_handle_irq(irq); > pos++; > } > } This fix was already suggested by Trent Piepho, however, after review by Marc Zyngier and some information obtained from Synopsys IP development team, the following set of patches was generated to correct the problem. 830920e065e9 ("PCI: dwc: Use interrupt masking instead of disabling") fce5423e4f43 ("PCI: dwc: Take lock when ACKing an interrupt") 3f7bb2ec20ce ("PCI: dwc: Move interrupt acking into the proper callback") Regards, Gustavo