On Sat, Apr 30, 2022 at 12:33:51AM +0100, Marc Zyngier wrote: > On Fri, 29 Apr 2022 22:57:33 +0100, > Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Fri, Apr 29, 2022 at 09:42:52AM +0000, Conor.Dooley@xxxxxxxxxxxxx wrote: > > > On 28/04/2022 10:29, Lorenzo Pieralisi wrote: > > > > On Tue, Apr 05, 2022 at 12:17:51PM +0100, daire.mcnamara@xxxxxxxxxxxxx wrote: > > > >> From: Daire McNamara <daire.mcnamara@xxxxxxxxxxxxx> > > > >> > > > >> Clear MSI bit in ISTATUS register after reading it before > > > >> handling individual MSI bits > > > Clear the MSI bit in ISTATUS register after reading it, but before > > > reading and handling individual MSI bits from the IMSI register. > > > This avoids a potential race where new MSI bits may be set on the > > > IMSI register after it was read and be missed when the MSI bit in > > > the ISTATUS register is cleared. > > Honestly, I don't understand enough about IRQs to determine whether > > this is a correct fix. Hopefully Marc will chime in. All I really > > know how to do is compare all the drivers and see which ones don't fit > > the typical patterns. > > This seems sensible. In general, edge interrupts need an early Ack > *before* the handler can be run. If it happens after, you're pretty > much guaranteed to lose edges that would be generated between the > handler and the late Ack. > > This can be implemented in HW in a variety of ways (read a register, > write a register, or even both). Is this something that is or could be documented somewhere under Documentation, e.g., "here are the common canonical patterns to use"? I feel like an idiot because I have this kind of question all the time and I never know how to confidently analyze it. > > And speaking of that, I looked at all the users of > > irq_set_chained_handler_and_data() in drivers/pci. All the handlers > > except mc_handle_intx() and mc_handle_msi() call chained_irq_enter() > > and chained_irq_exit(). > > > > Are mc_handle_intx() and mc_handle_msi() just really special, or is > > this a mistake? > > That's just a bug. On the right HW, this would just result in lost > interrupts. I wonder if coccinelle or some other static analyzer would be smart enough to find this kind of error. Bjorn