On 07/11/2018 18:32, Trent Piepho wrote: > On Wed, 2018-11-07 at 12:57 +0000, Gustavo Pimentel wrote: >> On 06/11/2018 16:00, Marc Zyngier wrote: >>> On 06/11/18 14:53, Lorenzo Pieralisi wrote: >>>> On Sat, Oct 27, 2018 at 12:00:57AM +0000, Trent Piepho wrote: >>>>> >>>>> This gives the following race scenario: >>>>> >>>>> 1. An MSI is received by, and the status bit for the MSI is set in, the >>>>> DWC PCI-e controller. >>>>> 2. dw_handle_msi_irq() calls a driver's registered interrupt handler >>>>> for the MSI received. >>>>> 3. At some point, the interrupt handler must decide, correctly, that >>>>> there is no more work to do and return. >>>>> 4. The hardware generates a new MSI. As the MSI's status bit is still >>>>> set, this new MSI is ignored. >>>>> 6. dw_handle_msi_irq() unsets the MSI status bit. >>>>> >>>>> The MSI received at point 4 will never be acted upon. It occurred after >>>>> the driver had finished checking the hardware status for interrupt >>>>> conditions to act on. Since the MSI status was masked, it does not >>>>> generated a new IRQ, neither when it was received nor when the MSI is >>>>> unmasked. >>>>> > >> This status register indicates whether exists or not a MSI interrupt on that >> controller [0..7] to be handle. > > While the status for an MSI is set, no new interrupt will be triggered Yes > if another identical MSI is received, correct? You cannot receive another identical MSI till you acknowledge the current one (This is ensured by the PCI protocol, I guess). > >> In theory, we should clear the interrupt flag only after the interrupt has >> actually handled (which can take some time to process on the worst case scenario). > > But see above, there is a race if a new MSI arrives while still masked. > I can see no possible way to solve this in software that does not > involve unmasking the MSI before calling the handler. To leave the > interrupt masked while calling the handler requires the hardware to > queue an interrupt that arrives while masked. We have no docs, but the > designware controller doesn't appear to do this in practice. See my reply to Marc about the interrupt masking. Like you said, probably the solution pass through using interrupt mask/unmask register instead of interrupt enable/disable register. Can you do a quick test, since you can easily reproduce the issue? Can you change register offset on both functions dw_pci_bottom_mask() and dw_pci_bottom_unmask()? Basically exchange the PCIE_MSI_INTR0_ENABLE register by PCIE_MSI_INTR0_MASK. Thanks. Gustavo > >> However, the Trent's patch allows to acknowledge the flag and handle the >> interrupt later, giving the opportunity to catch a possible new interrupt, which >> will be handle by a new call of this function. >> >>> >>> What I'm interested in is the relationship this has with the mask/unmask >>> callbacks, and whether masking the interrupt before acking it would help. >> >> Although there is the possibility of mask/unmask the interruptions on the >> controller, from what I've seen typically in other dw drivers this is not done. >> Probably we don't get much benefit from using it. >> >> Gustavo >> >>> >>> Gustavo, can you help here? >>> >>> In any way, moving the action of acknowledging the interrupt to its >>> right spot in the kernel (dw_pci_bottom_ack) would be a good start. >>> >>> Thanks, >>> >>> M. >>> >>