Re: [PATCH RFC v2] rcu: Add a minimum time for marking boot as completed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 27, 2023 at 6:05 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
[...]
> > > > >>>>> On Mon, Feb 27, 2023 at 08:22:06AM -0500, Joel Fernandes wrote:
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>> On Feb 27, 2023, at 2:53 AM, Zhuo, Qiuxu <qiuxu.zhuo@xxxxxxxxx> wrote:
> > > > >>>>>>>
> > > > >>>>>>> 
> > > > >>>>>>>>
> > > > >>>>>>>> From: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > > > >>>>>>>> Sent: Saturday, February 25, 2023 11:34 AM
> > > > >>>>>>>> To: linux-kernel@xxxxxxxxxxxxxxx
> > > > >>>>>>>> Cc: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>; Frederic Weisbecker
> > > > >>>>>>>> <frederic@xxxxxxxxxx>; Lai Jiangshan <jiangshanlai@xxxxxxxxx>; linux-
> > > > >>>>>>>> doc@xxxxxxxxxxxxxxx; Paul E. McKenney <paulmck@xxxxxxxxxx>;
> > > > >>>>>>>> rcu@xxxxxxxxxxxxxxx
> > > > >>>>>>>> Subject: [PATCH RFC v2] rcu: Add a minimum time for marking boot as
> > > > >>>>>>>> completed
> > > > >>>>>>>>
> > > > >>>>>>>> On many systems, a great deal of boot happens after the kernel thinks the
> > > > >>>>>>>> boot has completed. It is difficult to determine if the system has really
> > > > >>>>>>>> booted from the kernel side. Some features like lazy-RCU can risk slowing
> > > > >>>>>>>> down boot time if, say, a callback has been added that the boot
> > > > >>>>>>>> synchronously depends on.
> > > > >>>>>>>>
> > > > >>>>>>>> Further, it is better to boot systems which pass 'rcu_normal_after_boot' to
> > > > >>>>>>>> stay expedited for as long as the system is still booting.
> > > > >>>>>>>>
> > > > >>>>>>>> For these reasons, this commit adds a config option
> > > > >>>>>>>> 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter
> > > > >>>>>>>> rcupdate.boot_end_delay.
> > > > >>>>>>>>
> > > > >>>>>>>> By default, this value is 20s. A system designer can choose to specify a value
> > > > >>>>>>>> here to keep RCU from marking boot completion.  The boot sequence will not
> > > > >>>>>>>> be marked ended until at least boot_end_delay milliseconds have passed.
> > > > >>>>>>>
> > > > >>>>>>> Hi Joel,
> > > > >>>>>>>
> > > > >>>>>>> Just some thoughts on the default value of 20s, correct me if I'm wrong :-).
> > > > >>>>>>>
> > > > >>>>>>> Does the OS with CONFIG_PREEMPT_RT=y kernel concern more about the
> > > > >>>>>>> real-time latency than the overall OS boot time?
> > > > >>>>>>
> > > > >>>>>> But every system has to boot, even an RT system.
> > > > >>>>>>
> > > > >>>>>>>
> > > > >>>>>>> If so, we might make rcupdate.boot_end_delay = 0 as the default value
> > > > >>>>>>> (NOT the default 20s) for CONFIG_PREEMPT_RT=y kernels?
> > > > >>>>>>
> > > > >>>>>> Could you measure how much time your RT system takes to boot before the application runs?
> > > > >>>>>>
> > > > >>>>>> I can change it to default 0 essentially NOOPing it, but I would rather have a saner default (10 seconds even), than having someone forget to tune this for their system.
> > > > >>>>>
> > > > >>>>> Provide a /sys location that the userspace code writes to when it
> > > > >>>>> is ready?  Different systems with different hardware and software
> > > > >>>>> configurations are going to take different amounts of time to boot,
> > > > >>>>> correct?
> > > > >>>>
> > > > >>>> I could add a sysfs node, but I still wanted this patch as well
> > > > >>>> because I am wary of systems where yet more userspace changes are
> > > > >>>> required. I feel the kernel should itself be able to do this. Yes, it
> > > > >>>> is possible the system completes "booting" at a different time than
> > > > >>>> what the kernel thinks. But it does that anyway (even without this
> > > > >>>> patch), so I am not seeing a good reason to not do this in the kernel.
> > > > >>>> It is also only a minimum cap, so if the in-kernel boot takes too
> > > > >>>> long, then the patch will have no effect.
> > > > >>>>
> > > > >>>> Thoughts?
> > > > >>>>
> > > > >>> Why "rcu_boot_ended" is not enough? As i see right after that an "init"
> > > > >>> process or shell or panic is going to be invoked by the kernel. It basically
> > > > >>> indicates that a kernel is fully functional.
> > > > >>>
> > > > >>> Or an idea to wait even further? Until all kernel modules are loaded by
> > > > >>> user space.
> > > > >>
> > > > >> I mentioned in commit message it is daemons, userspace initialization etc. There is a lot of userspace booting up as well and using the kernel while doing so.
> > > > >>
> > > > >> So, It does not make sense to me to mark kernel as booted too early. And no harm in adding some builtin kernel hysteresis. What am I missing?
> > > > >>
> > > > > Than it is up to user space to decide when it is ready in terms of "boot completed".
> > > >
> > > > I dont know if you caught up with the other threads. See replies from Paul and my reply to that.
> > > >
> > > > Also what you are proposing can be more harmful. If user space has a bug and does not notify the kernel that boot completed, then the boot can stay incomplete forever. The idea with this patch is to make things better, not worse.
> > > >
> > > I saw that Paul proposed to have a sysfs attribute using which you can
> > > send a notification.
> >
> > Maybe I am missing something but how will a sysfs node on its own work really?
> >
> > 1. delete kernel marking itself boot completed  -- and then sysfs
> > marks it completed?
> >
> > 2. delete kernel marking itself boot completed  -- and then sysfs
> > marks it completed, if sysfs does not come in in N seconds, then
> > kernel marks as completed?
> >
> > #1 is a no go, that just means a bug waiting to happen if userspace
> > forgets to write to sysfs.
> >
> > #2 is just an extension of this patch. So I can add a sysfs node on
> > top of this. And we can make the minimum time as a long period of
> > time, as you noted below:
> >
> > > IMHO, to me this patch does not provide a clear correlation between what
> > > is a boot complete and when it occurs. A boot complete is a synchronous
> > > event whereas the patch thinks that after some interval a "boot" is completed.
> >
> > But that is exactly how the kernel code is now without this patch, so
> > it is already broken in that sense, I am not really breaking it more
> > ;-)
> >
> > > We can imply that after, say 100 seconds an initialization of user space
> > > is done. Maybe 100 seconds then? :)
> >
> > Yes I am Ok with that. So are you suggesting we change the default to
> > 100 seconds and then add a sysfs node to mark as boot done whenever
> > userspace notifies?
>
> The combination of sysfs manipulated by userspace and a kernel failsafe
> makes sense to me.  Especially if by default triggering the failsafe
> splats.  That way, bugs where userspace fails to update the sysfs file
> get caught.

By splat, if we could do an "info" message, that would work for me
instead of a WARN_ON. I'm afraid of Android and other folks who
upgrade to the new kernel only to now have to go patch userspace.

So,
pr_info("RCU is still in boot-mode for the next N seconds, please
consider writing X to /sys/.. to avoid this message.");
?

> The non-default silent-failsafe mode is also useful to allow some power
> savings in advance of userspace getting the sysfs updating in place.
> And of course the default splatting setup can be used in internal testing
> with the release software being more tolerant of userspace foibles.

Sounds good, would 100 seconds be a good fail-safe trigger value?

Thanks,

 - Joel




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux