On 07/01/2019 17:16, Stephen Warren wrote: > On 1/7/19 2:36 AM, Gustavo Pimentel wrote: >> Hi Stephen, >> >> On 04/01/2019 21:45, Stephen Warren wrote: >>> From: Stephen Warren <swarren@xxxxxxxxxx> >>> >>> The current code does this when handling MSI IRQs: >>> >>> a) Process the irq. >>> b) Clear the latched IRQ status. >>> >>> If a new IRQ occurs any time after (a) has read the IRQ status for the >>> last time and before (b), it will be lost. For example, this occurs in >>> practice when using a Marvell 9171 AHCI controller with NCQ enabled; >>> many command timeouts occur with certain disk access patterns. >>> >>> Fix the code to do the following instead, so that if any new IRQs are >>> raised during the processing of the IRQ, the IRQ status is not cleared, >>> so that the IRQ is not lost. >>> >>> a) Clear the latched IRQ status. >>> b) Process the IRQ. >>> >>> This change reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt >>> status after it is handled, not before") >>> >>> This change re-applies commit ca1658921b63 ("PCI: designware: Fix >>> missing MSI IRQs") >>> >>> Signed-off-by: Stephen Warren <swarren@xxxxxxxxxx> >>> Cc: Arnd Bergmann <arnd@xxxxxxxx> >>> Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> >>> Cc: Faiz Abbas <faiz_abbas@xxxxxx> >>> Cc: Harro Haan <hrhaan@xxxxxxxxx> >>> Cc: Jingoo Han <jg1.han@xxxxxxxxxxx> >>> Cc: Joao Pinto <jpinto@xxxxxxxxxxxx> >>> Cc: Juergen Beisert <jbe@xxxxxxxxxxxxxx> >>> Cc: Marek Vasut <marex@xxxxxxx> >>> Cc: Matthias Mann <m.mann@xxxxxxxxxxxxxxxxxxxxxx> >>> Cc: Mohit Kumar <mohit.kumar@xxxxxx> >>> Cc: Pratyush Anand <pratyush.anand@xxxxxx> >>> Cc: Richard Zhu <hong-xing.zhu@xxxxxxxxxxxxx> >>> Cc: Sean Cross <xobs@xxxxxxxxxx> >>> Cc: Shawn Guo <shawn.guo@xxxxxxxxxx> >>> Cc: Siva Reddy Kallam <siva.kallam@xxxxxxxxxxx> >>> Cc: Srikanth T Shivanand <ts.srikanth@xxxxxxxxxxx> >>> Cc: Tim Harvey <tharvey@xxxxxxxxxxxxx> >>> --- >>> Note: This issue was found in downstream NVIDIA 4.9 and 4.14 kernels. >>> However, the exact same code structure is present in mainline and I have >>> no reason to believe the problem would not reproduce there. I have >>> compile tested but not runtime tested it in mainline, since my board is >>> not yet supported in mainline. >>> --- >>> drivers/pci/controller/dwc/pcie-designware-host.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c >>> index 692dd1b264fb..7fd6c56a6f35 100644 >>> --- a/drivers/pci/controller/dwc/pcie-designware-host.c >>> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c >>> @@ -98,10 +98,10 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp) >>> irq = irq_find_mapping(pp->irq_domain, >>> (i * MAX_MSI_IRQS_PER_CTRL) + >>> pos); >>> - generic_handle_irq(irq); >>> dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS + >>> (i * MSI_REG_CTRL_BLOCK_SIZE), >>> 4, 1 << pos); >>> + generic_handle_irq(irq); >>> pos++; >>> } >>> } >> >> This fix was already suggested by Trent Piepho, however, after review by Marc >> Zyngier and some information obtained from Synopsys IP development team, the >> following set of patches was generated to correct the problem. >> >> 830920e065e9 ("PCI: dwc: Use interrupt masking instead of disabling") >> fce5423e4f43 ("PCI: dwc: Take lock when ACKing an interrupt") >> 3f7bb2ec20ce ("PCI: dwc: Move interrupt acking into the proper callback") > > OK. I assume that dw_pci_bottom_ack() gets called before > dw_handle_msi_irq() for each IRQ? In which case, the last of those 3 Yes, the dw_pci_bottom_ack() gets called before dw_handle_msi_irq(). > changes is essentially identical to the patch I proposed, it's just that > the acking is handled in a separate function rather than earlier in the > same function. > No, there is also two other fixes. A missing lock/unlock on the ack and the correct use of the MASK/UNMASK interrupt register instead of ENABLE interrupt register.