Re: 6.1-rt: NOHZ tick-stop error: local softirq work is pending

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2025-02-24 at 12:55 +0100, Sebastian Andrzej Siewior wrote:
> On 2025-02-21 09:32:41 [+0000], Bezdeka, Florian wrote:
> > Hi all,
> Hi,
> 
> > when stressing a 6.1-rt based system with network load we can
> > immediately see the following in the system log:
> > 
> > [  165.260690] NOHZ tick-stop error: local softirq work is pending, handler #80!!!
> > [  165.264689] NOHZ tick-stop error: local softirq work is pending, handler #80!!!
> > [  165.268687] NOHZ tick-stop error: local softirq work is pending, handler #80!!!
> 
> which version is this? I think is is an imported issue. Is v6.1.119-rt45
> also affected?

This is a typo, right? You mean it is an important issue, no?

We can see that on

- v6.1.90-rt (Debian -rt kernel)
- v6.1.120-rt (Debian -rt kernel)
- v6.1.119-rt45 (So yes, this is also affected)
- v6.1.120-rt47

> 
> > It seems that
> > 
> > 96c1fa04f089 ("tick/rcu: Fix false positive "softirq work is pending" messages")
> > 
> > tried to fix this issue, but for some reason it does not work.
> > 
> > Is that something that is really allowed to happen on RT (which means
> > that one of the conditions for the warning is still wrong) or a real
> > problem? We did not notice any negative impact on the system so far.
> > 
> > Input welcome...
> 
> The thing is that this may happen on PREEMPT_RT. Usually because
> softirqs can't be run as the CPU is blocked on locks and the lock-owner
> is either preempted on another CPU or blocked on something else.
> The thing is that a NO_HZ CPU should not go idle if there are softirqs
> pending as in "there is work to do, no nap for you". But as I explained
> earlier, on PREEMPT_RT it might happen that the work can't be handled
> and if no task can be run, the CPU goes to sleep.
> 
> That means if you replace with PERIODIC, the warning goes away. If you
> start a CPU-hog (on per-CPU) the warning goes away.

With PERIODIC you mean CONFIG_HZ_PERIODIC, right?

We have CONFIG_NO_HZ_FULL=y set but do net set the nohz_full= cmdline
parameter, so that we should get CONFIG_NO_HZ_IDLE behavior at the end.

I realized today that the warning is somehow related to our RT tuning.
Enabling NAPI threading makes the warning go away, even if NAPI threads
are tuned the same way as ksoftirqd.

I will have to look into that in more depth.

Thanks for your input Sebastian.

> 
> > Best regards,
> > Florian
> 
> Sebastian






[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux