Re: NOHZ tick-stop error with ath10k SDIO

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Fri, 03 Sep 2021 10:07:38 +0200

Fabio,

On Thu, Sep 02 2021 at 23:51, Thomas Gleixner wrote:
> On Wed, Aug 18 2021 at 10:56, Paul E. McKenney wrote:
>> On Wed, Aug 18, 2021 at 02:02:17PM -0300, Fabio Estevam wrote:
>>> On Wed, Aug 18, 2021 at 1:29 PM Fabio Estevam <festevam@xxxxxxxxx> wrote:
>>>
>>> With the debug patch and suggested command line, I get the following log:
>>> https://pastebin.com/raw/X96zKw7i
>
> And looking at that ftrace output in the pastebin there is nothing which
> raises NET_TX_SOFTIRQ but then the warning claims it is pending.
>
> This does not make any sense at all.

Looked once more at the trace output. It seems to be incomplete. The
last recording of softirq raise was at 379568us ~= 0.38s post boot, but
the splat comes about 20 seconds post boot. Did your kernel trigger a
WARN_ON before that splat? If so, that might have disabled tracing.

As you are triggering this manually by invoking hostapd and the machine
should be still functional afterwards, can you please replace Paul's
debug patch with the one below? Please remove the command line option
and do the following:

# echo 1 >/sys/kernel/debug/tracing/events/irq/softirq_raise/enable
# echo 1 >/sys/kernel/debug/tracing/events/irq/softirq_entry/enable
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
# hostapd ...

Once the warning triggered do:

# cat /sys/kernel/debug/tracing/trace >trace.txt

That should give us the full trace data and hopefully a better
understanding of the problem.

Thanks,

        tglx
---

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6bffe5af8cb1..269f804090ef 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1015,6 +1015,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 
 		if (ratelimit < 10 && !local_bh_blocked() &&
 		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
+			tracing_off();
 			pr_warn("NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #%02x!!!\n",
 				(unsigned int) local_softirq_pending());
 			ratelimit++;