On Thu, 2024-08-01 at 10:56 -0700, Paul E. McKenney wrote: > This presentation might be helpful: > > https://www.youtube.com/watch?v=5yTf7u5J_kc > thx for the link. I will sure make time to watch it to improve my understanding of RCU. > > you didn't seem concerned that rcuog was missing for cpu1. > > rcutree.rcu_nocb_gp_stride sets the group size (stride) and there > > is 1 > > rcuog per group. I guess if I wanted to see a rcuog/1, I would need > > to > > set rcutree.rcu_nocb_gp_stride=1 > > The grace-period functions of the lead rcuos kthreads have been > pushed > off into a smaller number of rcuog kthreads. Which is why you do not > (repeat, *not*) have an rcuog kthread for each CPU. > > > it is probably not best possible setting but it is easy and simple > > to > > try out to see if it fix my problem. > > You probably do not want an rcuog kthread for each CPU, but it cannot > hurt to try it. > > Thanx, Paul so the og in the rcuog name means Offloading Grace-period? just with this little exchange, there is a big layer of mystery that got removed in my mind! my big take-away of your last 2 emails, is that you do not think that the interrupts that I am seeing are originating from RCU... I'll try to figure out how with ftrace, bpftrace can track down the interrupt origin... possibly that there is a way to ask them to record the stack when a specific function is called... I will look into Brendan Gregg's blog for more ideas on how this can done but this is not something that I have never done before... I am definitely outside my comfort zone in the last 7 days, it is very challenging but at the same time very rewarding... I did a quick search to who was calling irq_work_queue(). The call made by rcu_irq_work_resched(void) seemed to be so tailered to my config settings and what I was seeing that I did assume that it was the one but maybe not... worse case scenario, I'll add a printk there to convince myself. Worse case scenario, the trace will show up once ever 3 seconds.