On Thu, 2024-08-01 at 10:48 -0700, Paul E. McKenney wrote: > > > > yes I do. > > > > $ ps -eo pid,cpuid,comm | grep rcu > > 4 0 kworker/R-rcu_gp > > 8 0 kworker/0:0-rcu_gp > > 14 0 rcu_tasks_rude_kthread > > 15 0 rcu_tasks_trace_kthread > > 17 3 rcu_sched > > 18 3 rcuog/0 > > 19 0 rcuos/0 > > 20 0 rcu_exp_par_gp_kthread_worker/0 > > 21 3 rcu_exp_gp_kthread_worker > > 31 3 rcuos/1 > > 38 3 rcuog/2 > > 39 3 rcuos/2 > > 46 0 rcuos/3 > > This looks like you had either nohz_full=0-3 or rcu_nocbs=0-3, given > that you have rcuos kthreads for all four of your CPUs. Or perhaps > some > other setting that implied one or the other of these. the exact setting is: isolcpus=0,1,2 nohz_full=1,2 rcu_nocbs=1,2 rcutree.rcu_nocb_gp_stride=4 irqaffinity=3 maybe you can quickly confirm this but by reading rcu/tree_nocb.h but I have been under the impression that nohz_full=1,2 is implying rcu_nocbs=1,2, therefore I could remove the rcu_nocbs parameter and it would not change anything. (is there other ktread implications coming along with isolcpus?). I want to have control over what is run on cpu0 so it is enumerated in isolcpus. It is, however, not nohz_full. As I have mentionned in 1-2 emails ago. cpu0 is where the networking I/O is made. net/core is such a big RCU user + NIC driver interrupts (I am trying hard to eliminate them with napi_busy_poll), that I have figured out making cpu0 nohz_full efficiently was a lost battle before it even started. maybe something somewhere is not used to see isolcpus and nohz_full having different values and does something unexpected as a result... > > > > > > the absence of of rcuog/1 is causing rcu_irq_work_resched() > > > > > to > > > > > raise > > > > > an > > > > > interrupt every 2-3 seconds on cpu1. > > > > > > Did you build with CONFIG_LAZY_RCU=y? > > > > no. I was not even aware that it was existing. I left alone the > > default > > setting! > > Worth a try, as this is what it is designed for. Ok, I will but when I read what it is doing, I was under the impression that the main objective was to trade low-latency with reduced power consumption... I am not seeing how batching callbacks would help in eliminating the interrupts that I am seeing. At best, they will be less frequent... and most importantly, I am going through all these loops to reduce my system latency by few uSecs... power consumption is my last worry... the only way that I would consider it, is if increasing RCU latency is improving my program latency... All options are considered... I will take a look at this suggestion.