On Fri, Mar 03, 2023 at 09:38:51PM +0000, Joel Fernandes (Google) wrote: > On many systems, a great deal of boot (in userspace) happens after the > kernel thinks the boot has completed. It is difficult to determine if > the system has really booted from the kernel side. Some features like > lazy-RCU can risk slowing down boot time if, say, a callback has been > added that the boot synchronously depends on. Further expedited callbacks > can get unexpedited way earlier than it should be, thus slowing down > boot (as shown in the data below). > > For these reasons, this commit adds a config option > 'CONFIG_RCU_BOOT_END_DELAY' and a boot parameter rcupdate.boot_end_delay. > Userspace can also make RCU's view of the system as booted, by writing the > time in milliseconds to: /sys/module/rcupdate/parameters/rcu_boot_end_delay > Or even just writing a value of 0 to this sysfs node. > However, under no circumstance will the boot be allowed to end earlier > than just before init is launched. > > The default value of CONFIG_RCU_BOOT_END_DELAY is chosen as 15s. This > suites ChromeOS and also a PREEMPT_RT system below very well, which need > no config or parameter changes, and just a simple application of this patch. A > system designer can also choose a specific value here to keep RCU from marking > boot completion. As noted earlier, RCU's perspective of the system as booted > will not be marker until at least rcu_boot_end_delay milliseconds have passed > or an update is made via writing a small value (or 0) in milliseconds to: > /sys/module/rcupdate/parameters/rcu_boot_end_delay. > > One side-effect of this patch is, there is a risk that a real-time workload > launched just after the kernel boots will suffer interruptions due to expedited > RCU, which previous ended just before init was launched. However, to mitigate > such an issue (however unlikely), the user should either tune > CONFIG_RCU_BOOT_END_DELAY to a smaller value than 15 seconds or write a value > of 0 to /sys/module/rcupdate/parameters/rcu_boot_end_delay, once userspace > boots, and before launching the real-time workload. > > Qiuxu also noted impressive boot-time improvements with earlier version > of patch. An excerpt from the data he shared: > > 1) Testing environment: > OS : CentOS Stream 8 (non-RT OS) > Kernel : v6.2 > Machine : Intel Cascade Lake server (2 sockets, each with 44 logical threads) > Qemu args : -cpu host -enable-kvm, -smp 88,threads=2,sockets=2, … > > 2) OS boot time definition: > The time from the start of the kernel boot to the shell command line > prompt is shown from the console. [ Different people may have > different OS boot time definitions. ] > > 3) Measurement method (very rough method): > A timer in the kernel periodically prints the boot time every 100ms. > As soon as the shell command line prompt is shown from the console, > we record the boot time printed by the timer, then the printed boot > time is the OS boot time. > > 4) Measured OS boot time (in seconds) > a) Measured 10 times w/o this patch: > 8.7s, 8.4s, 8.6s, 8.2s, 9.0s, 8.7s, 8.8s, 9.3s, 8.8s, 8.3s > The average OS boot time was: ~8.7s > > b) Measure 10 times w/ this patch: > 8.5s, 8.2s, 7.6s, 8.2s, 8.7s, 8.2s, 7.8s, 8.2s, 9.3s, 8.4s > The average OS boot time was: ~8.3s. > > Tested-by: Qiuxu Zhuo <qiuxu.zhuo@xxxxxxxxx> > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> > --- > v1->v2: > Update some comments and description. > v2->v3: > Add sysfs param, and update with Test data. > > .../admin-guide/kernel-parameters.txt | 12 ++++ > cc_list | 8 +++ > kernel/rcu/Kconfig | 19 ++++++ > kernel/rcu/update.c | 68 ++++++++++++++++++- > 4 files changed, 106 insertions(+), 1 deletion(-) > create mode 100644 cc_list > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 2429b5e3184b..611de90d9c13 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -5085,6 +5085,18 @@ > rcutorture.verbose= [KNL] > Enable additional printk() statements. > > + rcupdate.rcu_boot_end_delay= [KNL] > + Minimum time in milliseconds that must elapse > + before the boot sequence can be marked complete > + from RCU's perspective, after which RCU's behavior > + becomes more relaxed. The default value is also > + configurable via CONFIG_RCU_BOOT_END_DELAY. > + Userspace can also mark the boot as completed > + sooner by writing the time in milliseconds, say once > + userspace considers the system as booted, to: > + /sys/module/rcupdate/parameters/rcu_boot_end_delay > + Or even just writing a value of 0 to this sysfs node. > + > rcupdate.rcu_cpu_stall_ftrace_dump= [KNL] > Dump ftrace buffer after reporting RCU CPU > stall warning. > diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig > index 9071182b1284..4b5ffa36cbaf 100644 > --- a/kernel/rcu/Kconfig > +++ b/kernel/rcu/Kconfig > @@ -217,6 +217,25 @@ config RCU_BOOST_DELAY > > Accept the default if unsure. > > +config RCU_BOOT_END_DELAY > + int "Minimum time before RCU may consider in-kernel boot as completed" > + range 0 120000 > + default 15000 > + help > + Default value of the minimum time in milliseconds that must elapse > + before the boot sequence can be marked complete from RCU's perspective, > + after which RCU's behavior becomes more relaxed. > + Userspace can also mark the boot as completed sooner than this default > + by writing the time in milliseconds, say once userspace considers > + the system as booted, to: /sys/module/rcupdate/parameters/rcu_boot_end_delay. > + Or even just writing a value of 0 to this sysfs node. > + > + The actual delay for RCU's view of the system to be marked as booted can be > + higher than this value if the kernel takes a long time to initialize but it > + will never be smaller than this value. > + > + Accept the default if unsure. > + > config RCU_EXP_KTHREAD > bool "Perform RCU expedited work in a real-time kthread" > depends on RCU_BOOST && RCU_EXPERT > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > index 19bf6fa3ee6a..93138c92136e 100644 > --- a/kernel/rcu/update.c > +++ b/kernel/rcu/update.c > @@ -224,18 +224,84 @@ void rcu_unexpedite_gp(void) > } > EXPORT_SYMBOL_GPL(rcu_unexpedite_gp); > > +/* > + * Minimum time in milliseconds until RCU can consider in-kernel boot as > + * completed. This can also be tuned at runtime to end the boot earlier, by > + * userspace init code writing the time in milliseconds (even 0) to: > + * /sys/module/rcupdate/parameters/rcu_boot_end_delay > + */ > +static int rcu_boot_end_delay = CONFIG_RCU_BOOT_END_DELAY; > + > static bool rcu_boot_ended __read_mostly; > +static bool rcu_boot_end_called __read_mostly; > +static DEFINE_MUTEX(rcu_boot_end_lock); > + > +static int param_set_rcu_boot_end(const char *val, const struct kernel_param *kp) > +{ > + uint end_ms; > + int ret = kstrtouint(val, 0, &end_ms); > + > + if (ret) > + return ret; > + WRITE_ONCE(*(uint *)kp->arg, end_ms); > + > + /* > + * rcu_end_inkernel_boot() should be called at least once during init > + * before we can allow param changes to end the boot. > + */ > + mutex_lock(&rcu_boot_end_lock); > + rcu_boot_end_delay = end_ms; > + if (!rcu_boot_ended && rcu_boot_end_called) { > + mutex_unlock(&rcu_boot_end_lock); > + rcu_end_inkernel_boot(); > + } > + mutex_unlock(&rcu_boot_end_lock); > + return ret; > +} > + > +static const struct kernel_param_ops rcu_boot_end_ops = { > + .set = param_set_rcu_boot_end, > + .get = param_get_uint, > +}; > +module_param_cb(rcu_boot_end_delay, &rcu_boot_end_ops, &rcu_boot_end_delay, 0644); > > /* > - * Inform RCU of the end of the in-kernel boot sequence. > + * Inform RCU of the end of the in-kernel boot sequence. The boot sequence will > + * not be marked ended until at least rcu_boot_end_delay milliseconds have passed. > */ > +void rcu_end_inkernel_boot(void); > +static void rcu_boot_end_work_fn(struct work_struct *work) > +{ > + rcu_end_inkernel_boot(); > +} > +static DECLARE_DELAYED_WORK(rcu_boot_end_work, rcu_boot_end_work_fn); > + > void rcu_end_inkernel_boot(void) > { > + mutex_lock(&rcu_boot_end_lock); > + rcu_boot_end_called = true; > + > + if (rcu_boot_ended) > + return; > + > + if (rcu_boot_end_delay) { > + u64 boot_ms = div_u64(ktime_get_boot_fast_ns(), 1000000UL); > + > + if (boot_ms < rcu_boot_end_delay) { > + schedule_delayed_work(&rcu_boot_end_work, > + rcu_boot_end_delay - boot_ms); <snip> urezki@pc638:~/data/raid0/coding/linux-rcu.git$ git diff diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 93138c92136e..93f426f0f4ec 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -289,7 +289,7 @@ void rcu_end_inkernel_boot(void) if (boot_ms < rcu_boot_end_delay) { schedule_delayed_work(&rcu_boot_end_work, - rcu_boot_end_delay - boot_ms); + msecs_to_jiffies(rcu_boot_end_delay - boot_ms)); mutex_unlock(&rcu_boot_end_lock); return; } urezki@pc638:~/data/raid0/coding/linux-rcu.git$ <snip> I think you need to apply above patch. I am not sure maybe Paul has already mentioned about it. But just in case. -- Uladzislau Rezki