Re: [PATCH] PCI: dwc: Fix interrupt race in when handling MSI

Trent Piepho <tpiepho@xxxxxxxxxx> · Thu, 8 Nov 2018 20:51:38 +0000

On Thu, 2018-11-08 at 11:46 +0000, Gustavo Pimentel wrote:
> On 07/11/2018 18:32, Trent Piepho wrote:
> > On Wed, 2018-11-07 at 12:57 +0000, Gustavo Pimentel wrote:
> > > On 06/11/2018 16:00, Marc Zyngier wrote:
> > > > On 06/11/18 14:53, Lorenzo Pieralisi wrote:
> > > > > On Sat, Oct 27, 2018 at 12:00:57AM +0000, Trent Piepho wrote:
> > > > > > 
> > > > > > This gives the following race scenario:
> > > > > > 
> > > > > > 1.  An MSI is received by, and the status bit for the MSI is set in, the
> > > > > > DWC PCI-e controller.
> > > > > > 2.  dw_handle_msi_irq() calls a driver's registered interrupt handler
> > > > > > for the MSI received.
> > > > > > 3.  At some point, the interrupt handler must decide, correctly, that
> > > > > > there is no more work to do and return.
> > > > > > 4.  The hardware generates a new MSI.  As the MSI's status bit is still
> > > > > > set, this new MSI is ignored.
> > > > > > 6.  dw_handle_msi_irq() unsets the MSI status bit.
> > > > > > 
> > > > > > The MSI received at point 4 will never be acted upon.  It occurred after
> > > > > > the driver had finished checking the hardware status for interrupt
> > > > > > conditions to act on.  Since the MSI status was masked, it does not
> > > > > > generated a new IRQ, neither when it was received nor when the MSI is
> > > > > > unmasked.
> > > > > > 
> > > This status register indicates whether exists or not a MSI interrupt on that
> > > controller [0..7] to be handle.
> > 
> > While the status for an MSI is set, no new interrupt will be triggered
> 
> Yes
> 
> > if another identical MSI is received, correct?
> 
> You cannot receive another identical MSI till you acknowledge the current one
> (This is ensured by the PCI protocol, I guess).

I don't believe this is a requirement of PCI.  We designed our hardware
to not send another MSI until our hardware's interrupt status register
is read, but we didn't have to do that.

> > > In theory, we should clear the interrupt flag only after the interrupt has
> > > actually handled (which can take some time to process on the worst case scenario).
> > 
> > But see above, there is a race if a new MSI arrives while still masked.
> >  I can see no possible way to solve this in software that does not
> > involve unmasking the MSI before calling the handler.  To leave the
> > interrupt masked while calling the handler requires the hardware to
> > queue an interrupt that arrives while masked.  We have no docs, but the
> > designware controller doesn't appear to do this in practice.
> 
> See my reply to Marc about the interrupt masking. Like you said, probably the
> solution pass through using interrupt mask/unmask register instead of interrupt
> enable/disable register.
> 
> Can you do a quick test, since you can easily reproduce the issue? Can you
> change register offset on both functions dw_pci_bottom_mask() and
> dw_pci_bottom_unmask()?
> 
> Basically exchange the PCIE_MSI_INTR0_ENABLE register by PCIE_MSI_INTR0_MASK.

Of course MSI still need to be enabled to work at all, which happens
once when the driver using the MSI registers a handler.  Masking can be
done via mask register after that.

It is not so easy for me to test on the newest kernel, as imx7d does
not work due to yet more bugs.  I have to port a set of patches to each
new kernel.  A set that does not shrink due to holdups like this.

I understand the new flow would look like this (assume dw controller
MSI interrupt output signal is connected to one of the ARM GIC
interrupt lines, there could be different or more controllers above the
dwc of course (but usually aren't)):

1. MSI arrives, status bit is set in dwc, interrupt raised to GIC.
2. dwc handler runs
3. dwc handler sees status bit is set for a(n) MSI(s)
4. dwc handler sets mask for those MSIs
5. dwc handler clears status bit
6. dwc handler runs driver handler for the received MSI
7. ** an new MSI arrives, racing with 6 **
8. status bit becomes set again, but no interrupt is raised due to mask
9. dwc handler unmasks MSI, which raises the interrupt to GIC because
of new MSI received in 7.
10. The original GIC interrupt is EOI'ed.
11. The interrupt for the dwc is re-raised by the GIC due to 9, and we
go back to 2.

It is very important that 5 be done before 6.  Less so 4 before 5, but
reversing the order there would allow re-raising even if the 2nd MSI
arrived before the driver handler ran, which is not necessary.

I do not see a race in this design and it appears correct to me.  But,
I also do not think there is any immediate improvement due to extra
steps of masking and unmasking the MSI.

The reason is that the GIC interrupt above the dwc is non-reentrant. 
It remains masked (aka active[1]) during this entire process (1 to 10).
 This means every MSI is effectively already masked.  So masking the
active MSI(s) a 2nd time gains nothing besides preventing some extra
edges for a masked interrupt going to the ARM GIC.

In theory, if the GIC interrupt handler was reentrant, then on receipt
of a new MSI we could re-enter the dwc handler on a different CPU and
run the new MSI (a different MSI!) at the same time as the original MSI
handler is still running.

There difference here is that by unmasking in the interrupt in the GIC
before the dwc handler is finished, masking an individual MSI in the
dwc is no longer a 2nd redundant masking.

[1] When I say masked in GIC, I mean the interrupt is in the "active"
or "active and pending" states.  In these states the interrupt will not
be raised to the CPU and can be considered masked.