Re: unexpected result with rcu_nocbs option

Olivier Langlois <olivier@xxxxxxxxxxxxxx> · Thu, 01 Aug 2024 14:53:09 -0400

On Thu, 2024-08-01 at 10:56 -0700, Paul E. McKenney wrote:
> This presentation might be helpful:
> 
> 	https://www.youtube.com/watch?v=5yTf7u5J_kc
> 
thx for the link. I will sure make time to watch it to improve my
understanding of RCU.

> > you didn't seem concerned that rcuog was missing for cpu1.
> > rcutree.rcu_nocb_gp_stride sets the group size (stride) and there
> > is 1
> > rcuog per group. I guess if I wanted to see a rcuog/1, I would need
> > to
> > set rcutree.rcu_nocb_gp_stride=1
> 
> The grace-period functions of the lead rcuos kthreads have been
> pushed
> off into a smaller number of rcuog kthreads.  Which is why you do not
> (repeat, *not*) have an rcuog kthread for each CPU.
> 
> > it is probably not best possible setting but it is easy and simple
> > to
> > try out to see if it fix my problem.
> 
> You probably do not want an rcuog kthread for each CPU, but it cannot
> hurt to try it.
> 
> 							Thanx, Paul
so the og in the rcuog name means Offloading Grace-period?

just with this little exchange, there is a big layer of mystery that
got removed in my mind!

my big take-away of your last 2 emails, is that you do not think that
the interrupts that I am seeing are originating from RCU...

I'll try to figure out how with ftrace, bpftrace can track down the
interrupt origin...

possibly that there is a way to ask them to record the stack when a
specific function is called...

I will look into Brendan Gregg's blog for more ideas on how this can
done

but this is not something that I have never done before... I am
definitely outside my comfort zone in the last 7 days, it is very
challenging but at the same time very rewarding...

I did a quick search to who was calling irq_work_queue().

The call made by rcu_irq_work_resched(void) seemed to be so tailered to
my config settings and what I was seeing that I did assume that it was
the one but maybe not...

worse case scenario, I'll add a printk there to convince myself. Worse
case scenario, the trace will show up once ever 3 seconds.