On Fri, 14 May 2021 22:25:50 +0200 Thomas Gleixner wrote: > On Fri, May 14 2021 at 12:38, Jakub Kicinski wrote: > > > On Fri, 14 May 2021 12:17:19 +0200 Thomas Gleixner wrote: > >> The driver invokes napi_schedule() in several places from task > >> context. napi_schedule() raises the NET_RX softirq bit and relies on the > >> calling context to ensure that the softirq is handled. That's usually on > >> return from interrupt or on the outermost local_bh_enable(). > >> > >> But that's not the case here which causes the soft interrupt handling to be > >> delayed to the next interrupt or local_bh_enable(). If the task in which > >> context this is invoked is the last runnable task on a CPU and the CPU goes > >> idle before an interrupt arrives or a local_bh_disable/enable() pair > >> handles the pending soft interrupt then the NOHZ idle code emits the > >> following warning. > >> > >> NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!! > >> > >> Prevent this by wrapping the napi_schedule() invocation from task context > >> into a local_bh_disable/enable() pair. > > > > I should have read through my inbox before replying :) > > > > I'd go for switching to raise_softirq_irqoff() in ____napi_schedule()... > > why not? > > Except that some instruction cycle beancounters might complain about > the extra conditional for the sane cases. > > But yes, I'm fine with that as well. That's why this patch is marked RFC :) When we're in the right context (irq/bh disabled etc.) the cost is just read of preempt_count() and jump, right? And presumably preempt_count() is in the cache already, because those sections aren't very long. Let me make this change locally and see if it is in any way perceivable. Obviously if anyone sees a way to solve the problem without much ifdefinery and force_irqthreads checks that'd be great - I don't. I'd rather avoid pushing this kind of stuff out to the drivers.