Re: 6.1-rt: NOHZ tick-stop error: local softirq work is pending

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2025-02-26 at 10:17 +0100, Sebastian Andrzej Siewior wrote:
> On 2025-02-25 16:16:25 [+0100], Florian Bezdeka wrote:
> > > which version is this? I think is is an imported issue. Is
> > > v6.1.119-rt45
> > > also affected?
> > 
> > This is a typo, right? You mean it is an important issue, no?
> 
> No, the version is correct. And I meant "imported" as in we got it
> from
> the stable queue.
> 
> > We can see that on
> > 
> > - v6.1.90-rt (Debian -rt kernel)
> > - v6.1.120-rt (Debian -rt kernel)
> > - v6.1.119-rt45 (So yes, this is also affected)
> > - v6.1.120-rt47
> 
> But if this is visible on v6.1.90-rt then it is not originating from
> what I assumed.
> 
> > With PERIODIC you mean CONFIG_HZ_PERIODIC, right?
> correct.
> 
> > We have CONFIG_NO_HZ_FULL=y set but do net set the nohz_full=
> > cmdline
> > parameter, so that we should get CONFIG_NO_HZ_IDLE behavior at the
> > end.
> > 
> > I realized today that the warning is somehow related to our RT
> > tuning.
> > Enabling NAPI threading makes the warning go away, even if NAPI
> > threads
> > are tuned the same way as ksoftirqd.
> NAPI threads? You have RPS enabled by any chance?
> Would commit
>     dad6b97702639 ("net: Allow to use SMP threads for backlog NAPI.")
>     80d2eefcb4c84 ("net: Use backlog-NAPI to clean up the
> defer_list.")

Hi,

I tried a backport of the two patches to 6.1.120-rt47, but for that a
lot of infrastructure needs to be backported as well. In a minimal
setting, I was able to reduce that to the following patches:

80d2eefcb4c84 net: Use backlog-NAPI to clean up the defer_list.
be12a1fe298e8 net: skbuff: add skb_append_pagefrags and use it
dad6b97702639 net: Allow to use SMP threads for backlog NAPI.
87eff2ec57b6d net: optimize napi_threaded_poll() vs RPS/RFS
8fcb76b934daf net: napi_schedule_rps() cleanup
a1aaee7f8f79d net: make napi_threaded_poll() aware of sd->defer_list

This, however requires CONFIG_PAGE_POOL=n, CONFIG_DEVMEM=n as the
page_pool_create_percpu parts added in 2b0cfa6e49566 ("net: add generic
percpu page_pool allocator") is not easy to backport.

With these settings we were not able to run our test workload that
reproduces the warning. By that, I simply can't tell if it reproduces
or not.

Best regards,
Felix

> 
> help?
> 
> > I will have to look into that in more depth.
> > 
> > Thanks for your input Sebastian.
> 
> You are welcome.
> 
> Sebastian

-- 
Siemens AG
Linux Expert Center
Friedrich-Ludwig-Bauer-Str. 3
85748 Garching, Germany






[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux