On Fri, 2024-08-02 at 06:58 -0700, Paul E. McKenney wrote: > On Fri, Aug 02, 2024 at 09:46:03AM -0400, Olivier Langlois wrote: > > On Thu, 2024-08-01 at 20:01 -0700, Paul E. McKenney wrote: > > > > > > Very good!!! > > > > > > The do_nocb_deferred_wakeup_timer() is due to call_rcu() being > > > invoked > > > in a context where it might not be safe to do a wakeup(). RCU > > > doesn't > > > have a lot of choice in this situation, so the usual approach is > > > to > > > figure out what is invoking call_rcu() on your nohz_full CPUs and > > > to > > > make it run elsewhere. > > > > > > I don't know what is happening with mix_interrupt_randomness(). > > > > > > Thanx, > > > Paul > > there few more that are popping out like: > > > > tsc_sync_check_timer_fn > > mce_timer_fn > > > > but those 2 + do_nocb_deferred_wakeup_timer are not immediately > > generating an interrupt. Only mix_interrupt_randomness does because > > it > > adds an already timed out timer. So the CPU is kicked on insertion. > > > > I have quickly looked at drivers/char/random.c > > > > and there is no obvious way to address this that I can think of > > without > > causing potential serious side-effects... > > > > but I really find mysterious that only 1 of my nohz_full cpus is > > impacted this... > > > > and imho, this does not sound like a good idea to include interrupt > > randomness of a nohz_full cpu... > > > > I think that I am going to throw down the towel of reaching the > > goal of > > 100% interrupt free for now. The amount of efforts required to > > reach > > the goal vs the diminishing result I can get is not a good deal. > > For > > now, I am going to tolerate this 27uSec interrupt once every 2-3 > > seconds... > > > > but I find this challenge very fascinating and I'll start to follow > > Brendan Gregg's blog to learn more about the field. > > > > thank you very much for your assistance. I am leaving with an > > impression that the rcu dev list is very helpful and friendly! > > Are you doing system calls on your worker CPUs? If so, one > straightforward way to get rid of this is to make your application > push > the system calls off to the housekeeping CPU. Keep in mind that > system > calls often need to defer work of one sort or another. > > The real-time guys would know more about this sort of thing. > > Thanx, Paul very little. I signal pthread condition variable that calls: - gettid() - futex() and madvise(MADV_DONTNEED) (I believe this comes from tcmalloc) you are opening up my horizons and I think you are right. If the nohz_full thread does not enter the kernel, it cannot interfere with your nohz_full setup. I need to take a break from this project to take care of other stuff that I have neglected while being absorbed by this never-ending challenge... but I'll definitely return to it with this new angle of attack... with the little amount of syscalls, it seems feasible to avoid them in one way or the other. at some point, it might be much easier to avoid the kernel than trying to fight with it to do what you want it to do. here is another sidenote. I am currently listening your talk about NO_HZ_FULL and I enjoy it very much! this made me realize that you were right about the odd detail that my setup is having 4 rcuos... I do not understand neither why I end up with 4. This is your mention in your talk about CONFIG_RCU_NOCB_CPU_ALL (now CONFIG_RCU_NOCB_CPU_DEFAULT_ALL I believe). I thought that maybe, I had this define set unknowingly... no I don't /proc $ zcat config.gz | grep RCU # RCU Subsystem CONFIG_TREE_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_TREE_SRCU=y CONFIG_TASKS_RCU_GENERIC=y CONFIG_NEED_TASKS_RCU=y CONFIG_TASKS_RUDE_RCU=y CONFIG_TASKS_TRACE_RCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_NEED_SEGCBLIST=y CONFIG_RCU_NOCB_CPU=y # CONFIG_RCU_NOCB_CPU_DEFAULT_ALL is not set # CONFIG_RCU_LAZY is not set # end of RCU Subsystem CONFIG_MMU_GATHER_RCU_TABLE_FREE=y # RCU Debugging # CONFIG_RCU_SCALE_TEST is not set # CONFIG_RCU_TORTURE_TEST is not set # CONFIG_RCU_REF_SCALE_TEST is not set CONFIG_RCU_CPU_STALL_TIMEOUT=60 CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0 # CONFIG_RCU_CPU_STALL_CPUTIME is not set # CONFIG_RCU_TRACE is not set # CONFIG_RCU_EQS_DEBUG is not set # end of RCU Debugging # CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING is not set boot params: isolcpus=0,1,2 nohz_full=1,2 rcu_nocbs=1,2 rcutree.rcu_nocb_gp_stride=4 irqaffinity=3 dmesg output: Aug 02 05:41:01 aws-dublin kernel: rcu: Hierarchical RCU implementation. Aug 02 05:41:01 aws-dublin kernel: rcu: RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=4. Aug 02 05:41:01 aws-dublin kernel: rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies. Aug 02 05:41:01 aws-dublin kernel: rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4 Aug 02 05:41:01 aws-dublin kernel: RCU Tasks Rude: Setting shift to 2 and lim to 1 rcu_task_cb_adjust=1. Aug 02 05:41:01 aws-dublin kernel: RCU Tasks Trace: Setting shift to 2 and lim to 1 rcu_task_cb_adjust=1. Aug 02 05:41:01 aws-dublin kernel: rcu: Offload RCU callbacks from CPUs: 1-2. Aug 02 05:41:01 aws-dublin kernel: rcu: srcu_init: Setting srcu_struct sizes based on contention. Aug 02 05:41:01 aws-dublin kernel: rcu: Hierarchical SRCU implementation. Aug 02 05:41:01 aws-dublin kernel: rcu: Max phase no-delay instances is 1000. $ ps -eo pid,cpuid,comm | grep rcu 4 0 kworker/R-rcu_gp 8 0 kworker/0:0-rcu_gp 14 0 rcu_tasks_rude_kthread 15 0 rcu_tasks_trace_kthread 17 3 rcu_sched 18 3 rcuog/0 19 0 rcuos/0 20 0 rcu_exp_par_gp_kthread_worker/0 21 3 rcu_exp_gp_kthread_worker 31 3 rcuos/1 38 3 rcuos/2 45 0 rcuos/3 yesterday, I did hypothesize that maybe my isolcpus setting could explain why rcuos0 was present... but this cannot explain why rcuos/3 is there too! this is strange...