On Wed, Mar 09, 2022 at 11:19:00AM +0100, Sebastian Andrzej Siewior wrote: > On 2022-03-08 10:10:40 [-0800], Paul E. McKenney wrote: > > > Correct. Before do_pre_smp_initcalls() there is no ksoftirqd so timers > > > won't be processed. Therefore even the 26secs timer triggers, there > > > will be no RCU-stall warning. Also I'm not sure how much of preemption > > > can happen at this point since there is not much going on. > > > Any delay at this points to a lockup and at this point the lockup > > > detector isn't even fully operational (it does not trigger here during > > > test). > > > > Is sysrq operational at this point in RT? If so, that can be used > > to diagnose the hang, at least in development environments. If the > > need to use sysrq becomes a problem, checks/wakeups can be added to > > rcu_sched_clock_irq(). > > I doubt (including !RT). The UART driver isn't yet loaded and any kind > of output is provided by an early_printk implementation which is just > output not input. OK, so rcu_sched_clock_irq() it is, should problems arise. Or a boot-time switch between timer and hrtimer, but that could get a bit ugly. > > And here is what I ended up hand-applying, presumably due to differences > > between -rt and -rcu. I also expanded the changelog a bit. Could you > > please check to make sure that I didn't mess something up? > > Nope, good, I added something to the description. > > > Oh, and I also couldn't resist taking advantage of 100 columns... > > What can I say? > > You may not believe it but since last week I am indeed an owner of an > wide screen. So I do enjoy the 100 columns. Also I still try to figure > out what to do with the other half of the screen ;) Very good! And I am sure that you have seen what I do with the extra screen real estate. ;-) Thanx, Paul > > ------------------------------------------------------------------------ > > > > commit 2895b3bb5f8a0ebe565c62b1d2e3e1efca669962 > > Author: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > > Date: Tue Mar 8 09:54:13 2022 -0800 > > > > rcu-tasks: Use schedule_hrtimeout_range() to wait for grace periods > > > > The synchronous RCU-tasks grace-period-wait primitives invoke > > schedule_timeout_idle() to give readers a chance to exit their > > read-side critical sections. Unfortunately, this fails during early > > boot on PREEMPT_RT because PREEMPT_RT relies solely on ksoftirqd to run > > timer handlers. Because ksoftirqd cannot operate until its kthreads > > are spawned, there is a brief period of time following scheduler > > initialization where PREEMPT_RT cannot run the timer handlers that > > schedule_timeout_idle() relies on, resulting in a hang. > > Could we add something like: > The delay here is used infrequent so it can be made to expire in > hard-IRQ context on PREEMPT_RT without increasing the overall system > latency. > > My problem here is not to use this (timer does not expire during boot, > lets make it raw) as a general example how things should be handled in > future. > I only managed to trigger this delay via the dynamic ftrace interface/ > samples so I'm comfortable with this. > Moreover I don't think that this timer, if batched with another one, will > not increase the latency substantially given that we trade here the wake > of ksoftirqd with the wake of the actual thread and that one wake up > shouldn't make much of a difference. The difference is that with > ksoftirqd, if we expire multiple timer then each timer is expired with > enabled interrupts - preemtible. > That is why I favoured spawning ksoftirqd earlier but I think this > works, too. Good points. What I did was to make the addition shown below. > > To avoid this boot-time hang, this commit replaces schedule_timeout_idle() > > with schedule_hrtimeout(), so that the timer expires in hardirq context. > > This is ensures that the timer fires even on PREEMPT_RT throughout the > > irqs-enabled portions of boot as well as during runtime. > > > > The timer is set to expire between fract and fract + HZ / 2 jiffies in > > order to align with any other timers that might expire during that time, > > thus reducing the number of wakeups. Note that RCU-tasks grace periods are infrequent, so the use of hrtimer should be fine. In contrast, in common-case code, user of hrtimer could result in performance issues. Does that work? And of course if RCU-tasks grace periods do become more frequent, the next step would be to use hrtimers at boot time and timers later on. But one step at a time. I also added Martin Lau and Andrii Nakryiko on CC since they have come across the occasional RCU-tasks performance issue. Thanx, Paul > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > Sebastian