On Mon, 2 May 2022, Thomas Gleixner wrote: > But free_irq() stopped issuing synchronize_irq() with commit > 519cc8652b3a ("genirq: Synchronize only with single thread on > free_irq()"). And that turns out to be the root cause of the problem. > I should have caught that back then, but in hindsight .... > > While the proposed patch works, I think the real solution is to ensure > that both the hardware interrupt _and_ the interrupt threads which are > associated to the removed action are in quiescent state. This should > catch the case you observed. I can confirm that your patch works. And it also explains why I never had this issue on a 4.4 kernel with realtime patch ... > --- > Subject: genirq: Quiesce interrupt threads in free_irq() > From: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Date: Mon, 02 May 2022 15:40:25 +0200 > > Fill void... > > Fixes: 519cc8652b3a ("genirq: Synchronize only with single thread on free_irq()") > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > --- > kernel/irq/manage.c | 25 +++++++++++++++++++++---- > 1 file changed, 21 insertions(+), 4 deletions(-) > > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -1914,6 +1914,22 @@ static struct irqaction *__free_irq(stru > */ > __synchronize_hardirq(desc, true); > > + /* > + * Wait for associated interrupt threads to complete. This cannot > + * use synchronize_irq() due to interrupt sharing in the PCIe > + * layer. See 519cc8652b3a ("genirq: Synchronize only with single > + * thread on free_irq()") for further explanation. > + */ > + if (action->thread) { > + unsigned int thread_mask = action->thread_mask; > + > + if (action->secondary) > + thread_mask |= action->secondary->thread_mask; > + > + wait_event(desc->wait_for_threads, > + !(atomic_read(&desc->threads_active) & thread_mask)); > + } > + > #ifdef CONFIG_DEBUG_SHIRQ > /* > * It's a shared IRQ -- the driver ought to be prepared for an IRQ > @@ -1931,10 +1947,11 @@ static struct irqaction *__free_irq(stru > #endif > > /* > - * The action has already been removed above, but the thread writes > - * its oneshot mask bit when it completes. Though request_mutex is > - * held across this which prevents __setup_irq() from handing out > - * the same bit to a newly requested action. > + * The action has already been removed above and both the hardware > + * interrupt and the associated threads have been synchronized, > + * which means they are in quiescent state. request_mutex is still > + * held which prevents __setup_irq() from handing out action's > + * thread_mask to a newly requested action. > */ > if (action->thread) { > kthread_stop(action->thread); > > > Tested-by: Thomas Pfaff <tpfaff@xxxxxxx> Thank you, Thomas