Fabio, On Thu, Sep 02 2021 at 23:51, Thomas Gleixner wrote: > On Wed, Aug 18 2021 at 10:56, Paul E. McKenney wrote: >> On Wed, Aug 18, 2021 at 02:02:17PM -0300, Fabio Estevam wrote: >>> On Wed, Aug 18, 2021 at 1:29 PM Fabio Estevam <festevam@xxxxxxxxx> wrote: >>> >>> With the debug patch and suggested command line, I get the following log: >>> https://pastebin.com/raw/X96zKw7i > > And looking at that ftrace output in the pastebin there is nothing which > raises NET_TX_SOFTIRQ but then the warning claims it is pending. > > This does not make any sense at all. Looked once more at the trace output. It seems to be incomplete. The last recording of softirq raise was at 379568us ~= 0.38s post boot, but the splat comes about 20 seconds post boot. Did your kernel trigger a WARN_ON before that splat? If so, that might have disabled tracing. As you are triggering this manually by invoking hostapd and the machine should be still functional afterwards, can you please replace Paul's debug patch with the one below? Please remove the command line option and do the following: # echo 1 >/sys/kernel/debug/tracing/events/irq/softirq_raise/enable # echo 1 >/sys/kernel/debug/tracing/events/irq/softirq_entry/enable # echo 1 > /proc/sys/kernel/stack_tracer_enabled # hostapd ... Once the warning triggered do: # cat /sys/kernel/debug/tracing/trace >trace.txt That should give us the full trace data and hopefully a better understanding of the problem. Thanks, tglx --- diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 6bffe5af8cb1..269f804090ef 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1015,6 +1015,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts) if (ratelimit < 10 && !local_bh_blocked() && (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) { + tracing_off(); pr_warn("NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #%02x!!!\n", (unsigned int) local_softirq_pending()); ratelimit++;