Hi folks, On Fri, 01 Jul 2022 04:15:45 +0100, Neeraj Upadhyay <quic_neeraju@xxxxxxxxxxx> wrote: > > Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited > grace periods") highlights a problem where aggressively blocking > SRCU expedited grace periods, as was introduced in commit > 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers > from consuming CPU"), introduces ~2 minutes delay to the overall > ~3.5 minutes boot time, when starting VMs with "-bios QEMU_EFI.fd" > cmdline on qemu, which results in very high rate of memslots > add/remove, which causes > ~6000 synchronize_srcu() calls for > kvm->srcu SRCU instance. > > Below table captures the experiments done by Zhangfei Gao and Shameer > to measure the boottime impact with various values of non-sleeping > per phase counts, with HZ_250 and preemption enabled: > > +──────────────────────────+────────────────+ > | SRCU_MAX_NODELAY_PHASE | Boot time (s) | > +──────────────────────────+────────────────+ > | 100 | 30.053 | > | 150 | 25.151 | > | 200 | 20.704 | > | 250 | 15.748 | > | 500 | 11.401 | > | 1000 | 11.443 | > | 10000 | 11.258 | > | 1000000 | 11.154 | > +──────────────────────────+────────────────+ > > Analysis on the experiment results showed improved boot time > with non blocking delays close to one jiffy duration. This > was also seen when number of per-phase iterations were scaled > to one jiffy. > > So, this change scales per-grace-period phase number of non-sleeping > polls, such that, non-sleeping polls are done for one jiffy. In addition > to this, srcu_get_delay() call in srcu_gp_end(), which is used to calculate > the delay used for scheduling callbacks, is replaced with the check for > expedited grace period. This is done, to schedule cbs for completed expedited > grace periods immediately, which results in improved boot time seen in > experiments. Testing done by Marc and Zhangfei confirms that this change recovers > most of the performance degradation in boottime; for CONFIG_HZ_250 configuration, > boottime improves from 3m50s to 41s on Marc's setup; and from 2m40s to ~9.7s > on Zhangfei's setup. > > In addition to the changes to default per phase delays, this change > adds 3 new kernel parameters - srcutree.srcu_max_nodelay, > srcutree.srcu_max_nodelay_phase, srcutree.srcu_retry_check_delay. > This allows users to configure the srcu grace period scanning delays, > depending on their system configuration requirements. > > Signed-off-by: Neeraj Upadhyay <quic_neeraju@xxxxxxxxxxx> > Tested-by: Marc Zyngier <maz@xxxxxxxxxx> > Tested-by: Zhangfei Gao <zhangfei.gao@xxxxxxxxxx> Is there any chance for this fix to make it into 5.19? The regression is significant enough on low-end systems, and I'd rather see it addressed. Thanks, M. -- Without deviation from the norm, progress is not possible.