Pete! On Fri, Jun 14 2024 at 21:42, Pete Swain wrote: > The flapping-irq detector still has a timebomb. > > A pathological workload, or test script, > can arm the spurious-irq timebomb described in > 4f27c00bf80f ("Improve behaviour of spurious IRQ detect") > > This leads to irqs being moved the much slower polled mode, > despite the actual unhandled-irq rate being well under the > 99.9k/100k threshold that the code appears to check. > > How? > - Queued completion handler, like nvme, servicing events > as they appear in the queue, even if the irq corresponding > to the event has not yet been seen. > > - queues frequently empty, so seeing "spurious" irqs > whenever the last events of a threaded handler's > while (events_queued()) process_them(); > ends with those events' irqs posted while thread was scanning. > In this case the while() has consumed last event(s), > so next handler says IRQ_NONE. > > - In each run of "unhandled" irqs, exactly one IRQ_NONE response > is promoted from IRQ_NONE to IRQ_HANDLED, by note_interrupt()'s > SPURIOUS_DEFERRED logic. > > - Any 2+ unhandled-irq runs will increment irqs_unhandled. > The time_after() check in note_interrupt() resets irqs_unhandled > to 1 after an idle period, but if irqs are never spaced more > than HZ/10 apart, irqs_unhandled keeps growing. > > - During processing of long completion queues, the non-threaded > handlers will return IRQ_WAKE_THREAD, for potentially thousands > of per-event irqs. These bypass note_interrupt()'s irq_count++ logic, > so do not count as handled, and do not invoke the flapping-irq > logic. > > - When the _counted_ irq_count reaches the 100k threshold, > it's possible for irqs_unhandled > 99.9k to force a move > to polling mode, even though many millions of _WAKE_THREAD > irqs have been handled without being counted. > > Solution: include IRQ_WAKE_THREAD events in irq_count. > Only when IRQ_NONE responses outweigh (IRQ_HANDLED + IRQ_WAKE_THREAD) > by the old 99:1 ratio will an irq be moved to polling mode. Nice detective work. Though I'm not entirely sure whether that's the correct approach as it might misjudge the situation where IRQ_WAKE_THREAD is issued but the thread does not make progress at all. Let me think about it some more. Thanks, tglx