On Wed, Jun 22, 2022 at 05:34:09PM -0700, Paul E. McKenney wrote: > On Wed, Jun 22, 2022 at 11:34:15PM +0000, Zhang, Qiang1 wrote: > > > > On Wed, Jun 22, 2022 at 06:35:49PM +0800, Zqiang wrote: > > > In CONFIG_PREEMPT=n and CONFIG_PREEMPT_COUNT=y kernel, after a exp > > > grace period begins, if detected current CPU enters idle in > > > rcu_exp_handler() IPI handler, will immediately report the exp QS of the > > > current cpu, at this time, maybe not being in an RCU read-side critical > > > section, but need wait until rcu-softirq or sched-clock irq or sched-switch > > > occurs on current CPU to check and report exp QS. > > > > > > > >I think the idea is OK, however, this "optimization" is based on the > > >implementation detail that rcu_read_lock() counts preempt_count when > > >CONFIG_PREEMPT_COUNT=y, right? It's a little bit dangerous because the > > >preempt_count when CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n is mostly > > >for debugging purposes IIUC, and in other words, _it could be gone_. > > > > > > > Yes, for CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n kernel > > The rcu_read_lock/unlock are replaced by preempt_disbale/enable, and the > > preempt-count is exists, so can report exp QS when not being an RCU > > read-side critical(preempt_count & (PREEMPT_MASK | SOFTIRQ_MASK )return zero). > > in IPI handler. > > > > For CONFIG_PREEMPT_COUNT=n and CONFIG_PREEMPT=n kernel, > > The rcu_read_lock/unlock is just barrier(). > > > > > > So I add IS_ENABLED(CONFIG_PREEMPT_COUNT) check in code. > > > > Of course, for CONFIG_PREEMPT_COUNT=n kernel, in RCU softirq, the > > preempt-count is also checked > > > > /* Report any deferred quiescent states if preemption enabled. */ > > if (IS_ENABLED(CONFIG_PREEMPT_COUNT) && (!(preempt_count() & PREEMPT_MASK))) { > > rcu_preempt_deferred_qs(current); > > > > but the RCU softirq may not be triggered in time and reported exp QS, for > > example a kernel loop exist on NO_HZ_FULL CPU > > > > this change, It is to capture the exp QS state earlier and report it. > > > > > > >Also I'm not aware of any but there could be someone assuming that RCU > > >read-side critical sections can be formed without > > >rcu_read_{lock,unlock}() in CONFIG_PREEMPT=n kernel. For example, there > > >might be "creative" code like the following: > > > > > > void do_something_only_in_nonpreempt(void) > > > { > > > int *p; > > > > > > // This function only gets called in PREEMPT=n kernel, > > > // which means everywhere is a RCU read-side critical > > > // section, let's save some lines of code. > > > > > rcu_read_lock(); > > > p = rcu_dereference_check(gp, !IS_ENABLED(PREEMPT)); > > > ... // of course no schedule() here. > > > <access p> > > rcu_read_unlock(); > > > } > > > > > > > Usually access to pointers of type rcu needs to be protected. Yes, _ususally_ they are, but what about the special cases? Because in PREEMPT=n kernel, almost everywhere is a RCU read-side critical section, some one might have been "creative" enough to omit these rcu_read_lock() and rcu_read_unlock(). > > Indeed, lockdep would normally complain about this sort of thing. > But in kernels built with (say) CONFIG_PREEMPT_NONE=y but without > CONFIG_PREEMPT_COUNT=N, can lockdep really tell the difference? > Actually with the rcu_dereference_check() above, lockdep cannot detect even CONFIG_PREEMPT_COUNT=y, that rcu_dereference_check() basically says "I know I'm in a read-side critical section if it's a non-preempt kernel, so don't bother to check". ;-) > > Any thoughts? > > It would be good to have some performance data on this change to expedited > grace periods. It is adding code, so it needs some real motivation. Agreed. Regards, Boqun > So, does this change make a system-level difference in (say) expedited > RCU grace-period latency, and if so, under what conditions? > > Thanx, Paul > > > >Again, I'm not aware of any existing code that does this but we need to > > >be sure. > > > > > >Regards, > > >Boqun > > > > > > This commit add a exp QS check in rcu_exp_handler(), when not being > > > in an RCU read-side critical section, report exp QS earlier. > > > > > > Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx> > > > --- > > > kernel/rcu/tree_exp.h | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > > > index be667583a554..34f08267410f 100644 > > > --- a/kernel/rcu/tree_exp.h > > > +++ b/kernel/rcu/tree_exp.h > > > @@ -828,11 +828,14 @@ static void rcu_exp_handler(void *unused) > > > { > > > struct rcu_data *rdp = this_cpu_ptr(&rcu_data); > > > struct rcu_node *rnp = rdp->mynode; > > > + bool preempt_bh_disabled = > > > + !!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)); > > > > > > if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) || > > > __this_cpu_read(rcu_data.cpu_no_qs.b.exp)) > > > return; > > > - if (rcu_is_cpu_rrupt_from_idle()) { > > > + if (rcu_is_cpu_rrupt_from_idle() || > > > + (IS_ENABLED(CONFIG_PREEMPT_COUNT) && !preempt_bh_disabled)) { > > > rcu_report_exp_rdp(this_cpu_ptr(&rcu_data)); > > > return; > > > } > > > -- > > > 2.25.1 > > >