Re: unexpected result with rcu_nocbs option

Olivier Langlois <olivier@xxxxxxxxxxxxxx> · Fri, 02 Aug 2024 11:36:22 -0400

On Fri, 2024-08-02 at 06:58 -0700, Paul E. McKenney wrote:
> On Fri, Aug 02, 2024 at 09:46:03AM -0400, Olivier Langlois wrote:
> > On Thu, 2024-08-01 at 20:01 -0700, Paul E. McKenney wrote:
> > > 
> > > Very good!!!
> > > 
> > > The do_nocb_deferred_wakeup_timer() is due to call_rcu() being
> > > invoked
> > > in a context where it might not be safe to do a wakeup().  RCU
> > > doesn't
> > > have a lot of choice in this situation, so the usual approach is
> > > to
> > > figure out what is invoking call_rcu() on your nohz_full CPUs and
> > > to
> > > make it run elsewhere.
> > > 
> > > I don't know what is happening with mix_interrupt_randomness().
> > > 
> > > 							Thanx,
> > > Paul
> > there few more that are popping out like:
> > 
> > tsc_sync_check_timer_fn
> > mce_timer_fn
> > 
> > but those 2 + do_nocb_deferred_wakeup_timer are not immediately
> > generating an interrupt. Only mix_interrupt_randomness does because
> > it
> > adds an already timed out timer. So the CPU is kicked on insertion.
> > 
> > I have quickly looked at drivers/char/random.c
> > 
> > and there is no obvious way to address this that I can think of
> > without
> > causing potential serious side-effects...
> > 
> > but I really find mysterious that only 1 of my nohz_full cpus is
> > impacted this...
> > 
> > and imho, this does not sound like a good idea to include interrupt
> > randomness of a nohz_full cpu...
> > 
> > I think that I am going to throw down the towel of reaching the
> > goal of
> > 100% interrupt free for now. The amount of efforts required to
> > reach
> > the goal vs the diminishing result I can get is not a good deal.
> > For
> > now, I am going to tolerate this 27uSec interrupt once every 2-3
> > seconds...
> > 
> > but I find this challenge very fascinating and I'll start to follow
> > Brendan Gregg's blog to learn more about the field.
> > 
> > thank you very much for your assistance. I am leaving with an
> > impression that the rcu dev list is very helpful and friendly!
> 
> Are you doing system calls on your worker CPUs?  If so, one
> straightforward way to get rid of this is to make your application
> push
> the system calls off to the housekeeping CPU.  Keep in mind that
> system
> calls often need to defer work of one sort or another.
> 
> The real-time guys would know more about this sort of thing.
> 
> 							Thanx, Paul
very little.

I signal pthread condition variable
that calls:

- gettid()
- futex()

and madvise(MADV_DONTNEED)
(I believe this comes from tcmalloc)

you are opening up my horizons and I think you are right. If the
nohz_full thread does not enter the kernel, it cannot interfere with
your nohz_full setup.

I need to take a break from this project to take care of other stuff
that I have neglected while being absorbed by this never-ending
challenge... but I'll definitely return to it with this new angle of
attack...

with the little amount of syscalls, it seems feasible to avoid them in
one way or the other.

at some point, it might be much easier to avoid the kernel than trying
to fight with it to do what you want it to do.

here is another sidenote. I am currently listening your talk about
NO_HZ_FULL and I enjoy it very much!

this made me realize that you were right about the odd detail that my
setup is having 4 rcuos... I do not understand neither why I end up
with 4. This is your mention in your talk about CONFIG_RCU_NOCB_CPU_ALL
(now CONFIG_RCU_NOCB_CPU_DEFAULT_ALL I believe). I thought that maybe,
I had this define set unknowingly... no I don't

/proc $ zcat config.gz | grep RCU
# RCU Subsystem
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_NEED_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_NOCB_CPU=y
# CONFIG_RCU_NOCB_CPU_DEFAULT_ALL is not set
# CONFIG_RCU_LAZY is not set
# end of RCU Subsystem
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
# RCU Debugging
# CONFIG_RCU_SCALE_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
# CONFIG_RCU_CPU_STALL_CPUTIME is not set
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging
# CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING is not set

boot params:
isolcpus=0,1,2 nohz_full=1,2 rcu_nocbs=1,2 rcutree.rcu_nocb_gp_stride=4
irqaffinity=3

dmesg output:
Aug 02 05:41:01 aws-dublin kernel: rcu: Hierarchical RCU
implementation.
Aug 02 05:41:01 aws-dublin kernel: rcu:         RCU restricting CPUs
from NR_CPUS=128 to nr_cpu_ids=4.
Aug 02 05:41:01 aws-dublin kernel: rcu: RCU calculated value of
scheduler-enlistment delay is 10 jiffies.
Aug 02 05:41:01 aws-dublin kernel: rcu: Adjusting geometry for
rcu_fanout_leaf=16, nr_cpu_ids=4
Aug 02 05:41:01 aws-dublin kernel: RCU Tasks Rude: Setting shift to 2
and lim to 1 rcu_task_cb_adjust=1.
Aug 02 05:41:01 aws-dublin kernel: RCU Tasks Trace: Setting shift to 2
and lim to 1 rcu_task_cb_adjust=1.
Aug 02 05:41:01 aws-dublin kernel: rcu:         Offload RCU callbacks
from CPUs: 1-2.
Aug 02 05:41:01 aws-dublin kernel: rcu: srcu_init: Setting srcu_struct
sizes based on contention.
Aug 02 05:41:01 aws-dublin kernel: rcu: Hierarchical SRCU
implementation.
Aug 02 05:41:01 aws-dublin kernel: rcu:         Max phase no-delay
instances is 1000.

$ ps -eo pid,cpuid,comm | grep rcu
      4     0 kworker/R-rcu_gp
      8     0 kworker/0:0-rcu_gp
     14     0 rcu_tasks_rude_kthread
     15     0 rcu_tasks_trace_kthread
     17     3 rcu_sched
     18     3 rcuog/0
     19     0 rcuos/0
     20     0 rcu_exp_par_gp_kthread_worker/0
     21     3 rcu_exp_gp_kthread_worker
     31     3 rcuos/1
     38     3 rcuos/2
     45     0 rcuos/3

yesterday, I did hypothesize that maybe my isolcpus setting could
explain why rcuos0 was present... but this cannot explain why rcuos/3
is there too!

this is strange...