Re: Should RCU_BOOST kernels use hrtimers in GP kthread?

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Wed, 17 Feb 2021 11:19:07 -0800

On Wed, Feb 17, 2021 at 07:01:59PM +0100, Sebastian Andrzej Siewior wrote:
> On 2021-02-17 07:54:47 [-0800], Paul E. McKenney wrote:
> > > I though boosting is accomplished by acquiring a rt_mutex in a
> > > rcu_read() section. Do you have some code to point me to, to see how a
> > > timer is involved here? Or is it the timer saying that *now* boosting is
> > > needed.
> > 
> > Yes, this last, which is in the grace-period kthread code, for example,
> > in rcu_gp_fqs_loop().
> >
> > > If your hrtimer is a "normal" hrtimer then it will be served by
> > > ksoftirqd, too. You would additionally need one of the
> > > HRTIMER_MODE_*_HARD to make it work.
> > 
> > Good to know.  Anything I should worry about for this mode?
> 
> Well. It is always hardirq. No spinlock_t, etc. within that callback.
> If you intend to wake a thread, that thread needs an elevated priority
> otherwise it won't be scheduled (assuming there is a RT tasking running
> which would block otherwise ksoftirqd).

Good to know, thank you!  I believe that all the needed locks are already
raw spinlocks, but the actual kernel code always takes precedence over
one's beliefs.

> Ah. One nice thing is that you can move the RCU threads to a house
> keeping CPU - away from the CPU(s) running the RT tasks. Would this
> scenario be still affected (if ksoftirqd would be blocked)?

At this point, I am going to say that it is the sysadm's job to place
the rcuo kthreads, and if they are placed poorly, life is hard.

This means that I need to create a couple of additional polling RCU
grace-period functions for rcutorture's priority-boosting use, but I
probably should have done that a long time ago.  Simpler to just call a
polling API occasionally than to handle all the corner cases of keeping
an RCU callback queued.

> Oh. One thing I forgot to mention: the timer_list timer is nice in terms
> of moving forward (the timer did not fire, the condition is true and you
> move the timeout forward).
> A hrtimer timer on the other hand needs to be removed, forwarded and
> added back to the "timer tree". This is considered more expensive
> especially if the timer does not fire.

There are some timers that are used to cause a wakeup to happen from
a clean environment, but maybe these can instead use irq-work.

> > Also, the current test expects callbacks to be invoked, which involves a
> > number of additional kthreads and timers, for example, in nocb_gp_wait().
> > I suppose I could instead look at grace-period sequence numbers, but I
> > believe that real-life use cases needing RCU priority boosting also need
> > the callbacks to be invoked reasonably quickly (as in within hundreds
> > of milliseconds up through very small numbers of seconds).
> 
> A busy/overloaded kvm-host could lead to delays by not scheduling the
> guest for a while.

That it can!  Aravinda Prasad prototyped a mechanism hinting to the
hypervisor in such cases, but I don't know that this ever saw the light
of day.

> My understanding of the need for RCU boosting is to get a task,
> preempted (by a RT task) within a RCU section, back on the CPU to
> at least close the RCU section. So it is possible to run RCU callbacks
> and free memory.
> The 10 seconds without RCU callbacks shouldn't be bad unless the OOM
> killer got nervous (and if we had memory allocation failures).
> Also, running thousands of accumulated callbacks isn't good either.

Sounds good, thank you!

							Thanx, Paul

> > Thoughts?
> > 
> > 							Thanx, Paul
> Sebastian