Re: [PATCH] PCI: dwc: Fix interrupt race in when handling MSI

Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx> · Tue, 6 Nov 2018 14:53:47 +0000

[CC Marc]

On Sat, Oct 27, 2018 at 12:00:57AM +0000, Trent Piepho wrote:
> This reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status
> after it is handled, not before").
> 
> This is a very real race that we observed quickly after switching from
> 4.13 to 4.16.  Using a custom PCI-e endpoint and driver, I was able to
> track it to the precise race and verify the fixed behavior, as will be
> described below.
> 
> This bug was originally fixed in 2013, in commit ca1658921b63 ("PCI:
> designware: Fix missing MSI IRQs") The discussion of that commit,
> archived in patchwork [1], is informative and worth reading.
> 
> The bug was re-added in the '8c934 commit this is reverting, which
> appeared in the 4.14 kernel.
> 
> Unfortunately, Synopsys appears to consider the operation of this PCI-e
> controller secret.  They provide no publicly available docs for it nor
> allow the references manuals of SoCs using the controller to publish any
> documentation of it.
> 
> So I can not say certain this code is correctly using the controller's
> features.  However, extensive testing gives me high confidence in the
> accuracy of what is described below.
> 
> If an MSI is received by the PCI-e controller while the status register
> bit for that MSI is set, the PCI-e controller will NOT generate another
> interrupt.  In addition, it will NOT queue or otherwise mark the
> interrupt as "pending", which means it will NOT generate an interrupt
> when the status bit is unmasked.
> 
> This gives the following race scenario:
> 
> 1.  An MSI is received by, and the status bit for the MSI is set in, the
> DWC PCI-e controller.
> 2.  dw_handle_msi_irq() calls a driver's registered interrupt handler
> for the MSI received.
> 3.  At some point, the interrupt handler must decide, correctly, that
> there is no more work to do and return.
> 4.  The hardware generates a new MSI.  As the MSI's status bit is still
> set, this new MSI is ignored.
> 6.  dw_handle_msi_irq() unsets the MSI status bit.
> 
> The MSI received at point 4 will never be acted upon.  It occurred after
> the driver had finished checking the hardware status for interrupt
> conditions to act on.  Since the MSI status was masked, it does not
> generated a new IRQ, neither when it was received nor when the MSI is
> unmasked.
> 
> It seems clear there is an unsolvable race here.
> 
> After this patch, the sequence becomes as follows:
> 
> 1.  An MSI is received and the status bit for the MSI is set in the
> DWC PCI-e controller.
> 2.  dw_handle_msi_irq() clears this MSI status bit.
> 3.  dw_handle_msi_irq() calls a driver's registered interrupt handler
> for the MSI received.
> 3.  At some point, the interrupt handler must decide, correctly, that
> there is no more work to do and return.
> 4.  The hardware generates a new MSI.  This sets the MSI status bit and
> triggers another interrupt in the interrupt controller(s) above the DWC
> PCI-e controller.  As the the dwc-pcie handler is not re-entrant, it is
> not run again at this time.
> 6.  dw_handle_msi_irq() finishes.  The MSI status bit remains set.
> 7.  The IRQ is re-raised and dw_handle_msi_irq() runs again.
> 8.  dw_handle_msi_irq() invokes the MSI's registered interrupt handler
> again as the status bit was still set.

Not sure why (5) is not used in your lists, I assume because you want
to highlight the race condition with the jump from 4 to 6 (or maybe
you do not like number 5 :), just curious).

> The observant will notice that point 4 present the opportunity for the
> SoC's interrupt controller to lose the interrupt in the same manner as
> the bug in this driver.  The driver for that interrupt controller will
> be written to properly deal with this.  In some cases the hardware
> supports an EOI concept, where the 2nd IRQ is masked and internally
> queued in the hardware, to be re-raised at EOI in step 7.  In other
> cases the IRQ will be unmasked and re-raised at step 4, but the kernel
> will see the handler is INPROGRESS and not re-invoke it, and instead set
> a PENDING flag, which causes the handler to re-run at step 7.
> 
> [1] https://patchwork.kernel.org/patch/3333681/
> 
> Fixes: 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status after it is handled, not before")

I have two questions:

- Commit 8c934095fa2f has been in the kernel for a year and no
  regression was reported. It was assumed to fix a problem so before
  reverting it I want to make sure we are not breaking anything else.
- Your reasoning seems correct but I would pick Marc's brain on this
  because I want to understand if what this patch does is what IRQ core
  expects it to do, especially in relation to the IRQ chaining you
  are mentioning.

Lorenzo

> Cc: stable@xxxxxxxxxxxxxxx
> Cc: Vignesh R <vigneshr@xxxxxx>
> Cc: Faiz Abbas <faiz_abbas@xxxxxx>
> Cc: Jingoo Han <jingoohan1@xxxxxxxxx>
> Cc: Gustavo Pimentel <gustavo.pimentel@xxxxxxxxxxxx>
> Cc: Joao Pinto <jpinto@xxxxxxxxxxxx>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>
> Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Signed-off-by: Trent Piepho <tpiepho@xxxxxxxxxx>
> ---
>  drivers/pci/controller/dwc/pcie-designware-host.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> index 29a05759a294..9a3960c95ad3 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -98,10 +98,10 @@ irqreturn_t dw_handle_msi_irq(struct pcie_port *pp)
>  			irq = irq_find_mapping(pp->irq_domain,
>  					       (i * MAX_MSI_IRQS_PER_CTRL) +
>  					       pos);
> -			generic_handle_irq(irq);
>  			dw_pcie_wr_own_conf(pp, PCIE_MSI_INTR0_STATUS +
>  						(i * MSI_REG_CTRL_BLOCK_SIZE),
>  					    4, 1 << pos);
> +			generic_handle_irq(irq);
>  			pos++;
>  		}
>  	}
> -- 
> 2.14.4
>