Re: unexpected result with rcu_nocbs option

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 1 Aug 2024 08:28:48 -0700

On Thu, Aug 01, 2024 at 10:52:15AM -0400, Olivier Langlois wrote:
> the initial parsing is fine...
> 
> Aug 01 14:05:51 aws-dublin kernel: rcu: Hierarchical RCU
> implementation.
> Aug 01 14:05:51 aws-dublin kernel: rcu:         RCU restricting CPUs
> from NR_CPUS=128 to nr_cpu_ids=4.
> Aug 01 14:05:51 aws-dublin kernel:         Rude variant of Tasks RCU
> enabled.
> Aug 01 14:05:51 aws-dublin kernel:         Tracing variant of Tasks RCU
> enabled.
> Aug 01 14:05:51 aws-dublin kernel: rcu: RCU calculated value of
> scheduler-enlistment delay is 10 jiffies.
> Aug 01 14:05:51 aws-dublin kernel: rcu: Adjusting geometry for
> rcu_fanout_leaf=16, nr_cpu_ids=4
> Aug 01 14:05:51 aws-dublin kernel: RCU Tasks Rude: Setting shift to 2
> and lim to 1 rcu_task_cb_adjust=1.
> Aug 01 14:05:51 aws-dublin kernel: RCU Tasks Trace: Setting shift to 2
> and lim to 1 rcu_task_cb_adjust=1.
> Aug 01 14:05:51 aws-dublin kernel: NR_IRQS: 8448, nr_irqs: 456,
> preallocated irqs: 16
> Aug 01 14:05:51 aws-dublin kernel: NO_HZ: Full dynticks CPUs: 1-2.
> Aug 01 14:05:51 aws-dublin kernel: rcu:         Offload RCU callbacks
> from CPUs: 1-2.
> Aug 01 14:05:51 aws-dublin kernel: rcu: srcu_init: Setting srcu_struct
> sizes based on contention.
> 
> On Thu, 2024-08-01 at 10:28 -0400, Olivier Langlois wrote:
> > this is with kernel 6.10.2
> > 
> > I have these options set on the boot command line:
> > isolcpus=0,1,2 nohz_full=1,2 rcu_nocbs=1,2
> > 
> > $ ps -eo pid,cpuid,comm | grep rcuog
> >      18     3 rcuog/0
> >      38     0 rcuog/2
> > 
> > I do not understand why a rcuog task is spawn for cpu0.
> > I would have expected to have one for cpu1.

These handle grace periods, each for a group of CPUs.  You should have one
rcuog kthread for each group of roughly sqrt(nr_cpu_ids) that contains at
least one offloaded CPU, in your case, sqrt(4), which is 2.  You could
use the rcutree.rcu_nocb_gp_stride kernel boot parameter to override
this default, for example, you might want rcutree.rcu_nocb_gp_stride=4
in your case.

> > I do have a
> >      31     3 rcuos/1
> > 
> > I am not familiar enough with rcu to know what rcuos is for.

This is the kthread that invokes the callbacks for CPU 1, assuming you
have a non-preemptible kernel (otherwise rcuop/1 for historical reasons
that seemed like a good idea at the time).  Do you also have an rcuos/2?
(See the help text for CONFIG_RCU_NOCB_CPU.)

> > the absence of of rcuog/1 is causing rcu_irq_work_resched() to raise
> > an
> > interrupt every 2-3 seconds on cpu1.

Did you build with CONFIG_LAZY_RCU=y?

Did you use something like taskset to confine the rcuog and rcuos
kthreads to CPUs 0 and 3 (you seem to have 4 CPUs)?

Might that interrupt be due to a call_rcu() on CPU 1?  If so, can the
work causing that call_rcu() be placed on some other CPU?

> > I am currently reading rcu/tree_nocb.h to try to make sense of what I
> > am seeing but I am pinging the rcu list just in case what I am seeing
> > would be immediately obvious to one of you...

Others might have more suggestions...

							Thanx, Paul