Re: [PATCH] rcu: Delay the RCU-selftests during boot.

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Wed, 9 Mar 2022 11:19:00 +0100

On 2022-03-08 10:10:40 [-0800], Paul E. McKenney wrote:
> > Correct. Before do_pre_smp_initcalls() there is no ksoftirqd so timers
> > won't be processed. Therefore even the 26secs timer triggers, there
> > will be no RCU-stall warning. Also I'm not sure how much of preemption
> > can happen at this point since there is not much going on.
> > Any delay at this points to a lockup and at this point the lockup
> > detector isn't even fully operational (it does not trigger here during
> > test).
> 
> Is sysrq operational at this point in RT?  If so, that can be used
> to diagnose the hang, at least in development environments.  If the
> need to use sysrq becomes a problem, checks/wakeups can be added to
> rcu_sched_clock_irq().

I doubt (including !RT). The UART driver isn't yet loaded and any kind
of output is provided by an early_printk implementation which is just
output not input.

> And here is what I ended up hand-applying, presumably due to differences
> between -rt and -rcu.  I also expanded the changelog a bit.  Could you
> please check to make sure that I didn't mess something up?

Nope, good, I added something to the description.

> Oh, and I also couldn't resist taking advantage of 100 columns...
> What can I say?

You may not believe it but since last week I am indeed an owner of an
wide screen. So I do enjoy the 100 columns. Also I still try to figure
out what to do with the other half of the screen ;)

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 2895b3bb5f8a0ebe565c62b1d2e3e1efca669962
> Author: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> Date:   Tue Mar 8 09:54:13 2022 -0800
> 
>     rcu-tasks: Use schedule_hrtimeout_range() to wait for grace periods
>     
>     The synchronous RCU-tasks grace-period-wait primitives invoke
>     schedule_timeout_idle() to give readers a chance to exit their
>     read-side critical sections.  Unfortunately, this fails during early
>     boot on PREEMPT_RT because PREEMPT_RT relies solely on ksoftirqd to run
>     timer handlers.  Because ksoftirqd cannot operate until its kthreads
>     are spawned, there is a brief period of time following scheduler
>     initialization where PREEMPT_RT cannot run the timer handlers that
>     schedule_timeout_idle() relies on, resulting in a hang.

Could we add something like:
   The delay here is used infrequent so it can be made to expire in
   hard-IRQ context on PREEMPT_RT without increasing the overall system
   latency.

My problem here is not to use this (timer does not expire during boot,
lets make it raw) as a general example how things should be handled in
future.
I only managed to trigger this delay via the dynamic ftrace interface/
samples so I'm comfortable with this.
Moreover I don't think that this timer, if batched with another one, will
not increase the latency substantially given that we trade here the wake
of ksoftirqd with the wake of the actual thread and that one wake up
shouldn't make much of a difference. The difference is that with
ksoftirqd, if we expire multiple timer then each timer is expired with
enabled interrupts - preemtible.
That is why I favoured spawning ksoftirqd earlier but I think this
works, too.

>     To avoid this boot-time hang, this commit replaces schedule_timeout_idle()
>     with schedule_hrtimeout(), so that the timer expires in hardirq context.
>     This is ensures that the timer fires even on PREEMPT_RT throughout the
>     irqs-enabled portions of boot as well as during runtime.
>     
>     The timer is set to expire between fract and fract + HZ / 2 jiffies in
>     order to align with any other timers that might expire during that time,
>     thus reducing the number of wakeups.
>     
>     Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
>     Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

Sebastian