> -----Original Message----- > From: Jason Gunthorpe [mailto:jgg@xxxxxxxx] > Sent: Tuesday, December 19, 2017 3:58 PM > To: Dalessandro, Dennis <dennis.dalessandro@xxxxxxxxx> > Cc: dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Ruhl, Michael J > <michael.j.ruhl@xxxxxxxxx>; Patel, Jay P <jay.p.patel@xxxxxxxxx>; Sanchez, > Sebastian <sebastian.sanchez@xxxxxxxxx> > Subject: Re: [PATCH for-next 01/11] IB/hfi1: Destroy link_wq workqueue after > free_irq() > > On Mon, Dec 18, 2017 at 07:56:16PM -0800, Dennis Dalessandro wrote: > > > When kernel is built with CONFIG_DEBUG_SHIRQ config flag, an extra call > > to IRQ handler is made from _free_irq() function. The driver should be > > prepared for this fake call. > > It is not a 'fake call' it is call designed to test what happens > during an IRQ handler racing with free. Hi Jason, After further review it appears that our patch: commit 05cb18fda926ddce299280bd86cbc9d491306f28 IB/hfi1: Update HFI to use the latest PCI API has exposed an architecture issue in how we do our driver remove. Our current path does not adequately deal with the race condition of getting an IRQ during the removal. To address, we could submit a patch to revert the above commit. After re-architecting the driver remove path, we would re-submit the PCI API patch. Note: reverting the PCI API patch will not make the crash addressed in: IB/hfi1: Destroy link_wq workqueue after free_irq() issue go away, but will make it more difficult for the issue to occur. Thanks, Mike > > > Adding a mechanism which detects whether handler is invoked after > > disabling interrupts. hfi_intr_mask field is added to hfi1_devdata > > structure which is replica of interrupt mask register of hfi device. > > The field is updated while writing a value to register. > > And this is not the typical solution. > > Explain why adding a non-atomic variable to a concurrancy doesn't just > create races and a mess?? > > free_irq() is a synchronous fence, It returns once all the running IRQ > handlers have exited and guarantees they will not be called again. > > You are supposed to call it before destroying anything that the IRQ > handler could be using. > > re-ordering desstruction order is the typical solution. > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html