Re: [PATCH RFC v2] rcu: Add a minimum time for marking boot as completed

Frederic Weisbecker <frederic@xxxxxxxxxx> · Wed, 1 Mar 2023 18:11:26 +0100

On Tue, Feb 28, 2023 at 08:09:02PM +0000, Joel Fernandes wrote:
> Hi Frederic,
> 
> On Tue, Feb 28, 2023 at 12:04:36PM +0100, Frederic Weisbecker wrote:
> Both should play a role. For lazy, we found callbacks that showed later in
> the full boot sequence (like the SCSI issue).
> 
> For expedited, there is new data from Qiuxu showing 5% improvement in boot
> time.

Indeed, nice one!

> Right, and we also don't know if in the future, somebody queues a CB that
> slows down boot as well (say they queue a lazy CB that does a wakeup), even
> if currently there are not any such. As noted, that SCSI issue did show. Just
> to note, callbacks doing wakeups are supposed to call call_rcu_hurry().

Sure, ideally we should have a way to debug lazy callbacks involved in wait/wake
dependency against the enqueuer but that doesn't sound easy to do...

> Yes, sorry, it was more an RFC but still should have been more clear. For the
> v3 I'll definitely make it clear.
> 
> rcu_end_inkernel_boot() is called before init is run. But the kernel cannot
> posibly know when init has finished running and say the system is now waiting
> for user login, or something. There's a considerable amount time from
> rcu_end_inkernel_boot() to when the system is actually "booted". That's the
> main issue. We could look at CPU load, but that's not ideal. Maybe wait for
> user input, but that sucks as well.

So it looks like user boot is involved in what you want to wait for. In that
case only userspace can tell.

> Hmmm I see what you mean, so a conservative and configurable "fail-safe"
> timeout followed by sysctl to end the boot earlier than the timeout, should
> do it (something like 30 seconds IMHO sounds reasonable)? In any case,
> whatever way we go, we would not end the kernel boot before
> rcu_end_inkernel_boot() is called at least once (which is the current
> behavior).
> 
> So it would be:
> 
>   low level boot + initcalls
>        20 sec                         30 second timeout
> |------------------------------|--------------------------
>                                |                         |
> 	        old rcu_end_inkernel_boot()      new rcu_end_inkernel_boot()
> 
> But it could be, if user decides:
>   low level boot + initcalls
>        20 sec                         10 second timeout
> |------------------------------|--------------------------
>                                |                         |
> 	        old rcu_end_inkernel_boot()      new rcu_end_inkernel_boot()
> 		                                 via /sys/ entry.

The problem I have with a random default timeout is that it may break sensitive
workloads. If the default is 30 and say the boot only takes 5 seconds and
immediately launches a latency sensitive task, this may break things in a
subtle way during these 25 seconds when it usually didn't. Because expedited
RCU is a hammer interrupting all non-idle CPUs.

Until now forcing expedited RCU was only performed before any user code. Now it
crosses the boundary so better be careful. I'd personally advocate for keeping
the current call before init is launched. Then provide an end_boot_sysctl kernel
boot parameter that will ignore the current call before init and let the user
signal that through sysctl.

> > > > So shouldn't we disable lazy callbacks by default when CONFIG_RCU_LAZY=y and then
> > > > turn it on with "sysctl kernel.rcu.lazy=1" only whenever userspace feels ready
> > > > about it? We can still keep the current call to rcu_end_inkernel_boot().
> > > 
> > > Hmm IMHO that would add more knobs for not much reason honestly. We already
> > > have CONFIG_RCU_LAZY default disabled, I really don't want to add more
> > > dependency (like user enables the config and does not see laziness).
> > 
> > I don't know. Like I said, different problems, different solutions. Let's
> > identify what the issue is precisely. For example can we expect that the issues
> > on boot can be a problem also on some temporary workloads?
> > 
> > Besides I'm currently testing a very hacky flavour of rcu_lazy and so far it
> > shows many idle calls that would have been delayed if callbacks weren't queued
> > as lazy.
> 
> Can you provide more details? What kind of hack flavor, and what is it doing?

Sorry, I meant a hacky implementation of lazy to work with !NOCB.

Thanks.