Re: Wakes of the rcuc/ thread on isolated CPUs.

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Mon, 8 Jul 2024 07:07:54 -0700

On Mon, Jul 08, 2024 at 08:38:38AM +0200, Sebastian Andrzej Siewior wrote:
> On 2024-07-05 18:15:25 [-0700], Paul E. McKenney wrote:
> > > Looking at the patch, there would be a delay up to 5 secs which would
> > 
> > I would have said "up to 1 sec", so what am I missing?
> 
> The patch description said that there is 5 sec upper limit. Yes, default
> 1 sec.

Ah, yes, there is a five-second upper limit on the value of the
rcutree.nohz_full_patience_delay kernel boot parameter and there is a
default one-second value.  The default value places a one-second upper
limit on the RCU grace-period kthread's patience.

So I guess we were both right.  ;-)

> > > mean if the task consumes 100% of the CPU then it doesn't change a
> > > thing.
> > 
> > As long as RCU's grace-period kthread gets some CPU and as long as
> > the CPU-bound task executes often in userspace, that task's CPU's rcuc
> > kthread need never run.  The grace-period kthread would see that the CPU
> > has been in an extended quiescent state, and would report that quiescent
> > state on that CPU's behalf.
> 
> Okay. So the 100% usage is the problem indeed.

And, if that usage is nohz_full userspace execution, also the potential
solution.

> > > Thank you Paul for the pointers.
> > > 
> > > > This is again a workaround.  Clearly, it would be better if we could
> > > > eliminate that second rcuc wakeup.  I tried something similar some time
> > > > back, and there was a problem with it.  I will see if I can reconstitute
> > > > the corresponding brain cells.
> > > 
> > > Is my assumption correct, in order to push the grace period forward,
> > > otherwise the whole is stuck?
> > 
> > Again, if the CPU running the CPU-bound task executes in nohz_full
> > userspace context, that CPU's rcuc kthread need never run.
> > 
> > Of course, if you tried the patch and it didn't help, that is another
> > story.  Hardware facts beat human theories, now as always.
> 
> of course.
> 
> > > > But in the meantime, one advantage of the workaround is that in the
> > > > common case, it would reduce the number of rcuc wakeups to zero, rather
> > > > than to just one.
> > > > 
> > > > Thoughts?
> > > 
> > > I *think* if what I just wrote is correct, I will either have to raise
> > > the priority of rcuc/ or make the thread, that consumes 100% of the CPU
> > > lose its RT priority. Then with the limited number of wakeups it should
> > > be doable.
> > 
> > You can:  (1) Raise the rcuc kthread's priority, as you say, (2) Ensure
> > that the CPU-bound task runs frequently (or even always) in nohz_full
> > usermode context, or (3) #2 and also apply the patch, which would in
> > addition prevent the wakeups.
> > 
> > I think.  After all, I could easily be missing something here.
> 
> Let me backport and see what happens in the end.
> 
> Thank you.

Thank you in advance, and I look forward to hearing how it goes!

> > > PS: I do remember the RCU-task thread we had. I did have an idea but I
> > > need check if this is feasible first. So I did not forget, just slow…
> > 
> > I must confess that I have been wondering about how much tracing goes
> > on withing real-time systems running in production...
> 
> This is little I know. However, it is used during testing of production
> systems to see what is going on due to its little overhead.

That should introduce some fun constraints into tracing mechanisms and
their synchronizations.  Or some fun constraints into which mechanisms
are used when in production real-time systems.  But why not both?  ;-)

							Thanx, Paul