On Wed, May 04, 2022 at 04:12:39PM +0100, Conor Dooley wrote: > On 02/05/2022 20:22, Bjorn Helgaas wrote: > > On Sat, Apr 30, 2022 at 12:33:51AM +0100, Marc Zyngier wrote: > >> On Fri, 29 Apr 2022 22:57:33 +0100, > >> Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > >>> On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote: > >>>> On 28/04/2022 10:29, Lorenzo Pieralisi wrote: > >>>>> On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@xxxxxxxxxxxxx wrote: > >>>>>> From: Daire McNamara <daire.mcnamara@xxxxxxxxxxxxx> > >>>>>> > >>>>>> Clear MSI bit in ISTATUS register after reading it before > >>>>>> handling individual MSI bits > > > >>>> Clear the MSI bit in ISTATUS register after reading it, but before > >>>> reading and handling individual MSI bits from the IMSI register. > >>>> This avoids a potential race where new MSI bits may be set on the > >>>> IMSI register after it was read and be missed when the MSI bit in > >>>> the ISTATUS register is cleared. > > > >>> Honestly, I don't understand enough about IRQs to determine whether > >>> this is a correct fix. Hopefully Marc will chime in. All I really > >>> know how to do is compare all the drivers and see which ones don't fit > >>> the typical patterns. > >> > >> This seems sensible. In general, edge interrupts need an early Ack > >> *before* the handler can be run. If it happens after, you're pretty > >> much guaranteed to lose edges that would be generated between the > >> handler and the late Ack. > >> > >> This can be implemented in HW in a variety of ways (read a register, > >> write a register, or even both). > > > > Is this something that is or could be documented somewhere under > > Documentation, e.g., "here are the common canonical patterns to use"? > > I feel like an idiot because I have this kind of question all the time > > and I never know how to confidently analyze it. > > Daire is still having the IT issues, so before I resend the patch with > a new commit message, how is the following: > > Clear the MSI bit in ISTATUS_LOCAL register after reading it, but > before reading and handling individual MSI bits from the ISTATUS_MSI > register. This avoids a potential race where new MSI bits may be set > on the ISTATUS_MSI register after it was read and be missed when the > MSI bit in the ISTATUS_LOCAL register is cleared. It is still unclear. You should translate what Marc said above into how ISTATUS_MSI and ISTATUS_LOCAL work (ie describe how HW works). Please describe what the registers do and use that to describe the fix. Thanks, Lorenzo > Reported by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Link: https://lore.kernel.org/linux-pci/20220127202000.GA126335@bhelgaas/ > Fixes: 6f15a9c9f941 ("PCI: microchip: Add Microchip PolarFire PCIe controller driver") > Signed-off-by: Daire McNamara <daire.mcnamara@xxxxxxxxxxxxx> > > > >>> And speaking of that, I looked at all the users of > >>> irq_set_chained_handler_and_data() in drivers/pci. All the handlers > >>> except mc_handle_intx() and mc_handle_msi() call chained_irq_enter() > >>> and chained_irq_exit(). > >>> > >>> Are mc_handle_intx() and mc_handle_msi() just really special, or is > >>> this a mistake? > >> > >> That's just a bug. On the right HW, this would just result in lost > >> interrupts. > > Separate issue, separate patch. Do you want them in a series or as > another standalone patch? > > Thanks, > Conor.