On Tue, Aug 30, 2022 at 04:31:51PM +0800, Zqiang wrote: > For PREEMPT_RCU, the rcu_report_dead() is invoked means that the > outgoing CPU mask is clear from leaf rcu_node and has no further > need of RCU, so invoke rcu_preempt_depth() return value is always > zero in rcu_report_dead(), if the current outgoing CPU rcu_data > structure's cpu_no_qs.b.exp is true, the rcu_preempt_deferred_qs() > will invoke rcu_report_exp_rdp() to report exp QS. > > for non-PREEMPT_RCU, the rcu_preempt_deferred_qs() is equivalent to > rcu_report_exp_rdp(). > > Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx> > >Nice! > >One question... Currently, for PREEMPT_RCU, the outgoing CPU silently >reports a quiescent state even if there was a bug that resulted in that >CPU still being in an RCU read-side critical section. With your change, >the outgoing CPU would silently refuse to report a quiescent state. > >Is there something along the CPU-offline code path that already complains >about this situation? If not, I believe that the first WARN_ON_ONCE() >in rcu_implicit_dynticks_qs() would complain. In the following code, the current CPU will report QS if (rnp->qsmask & mask) return true. it means that the WARN_ON_ONCE(!rcu_rdp_cpu_online(rdp)) is not trigger. if (rnp->qsmask & mask) { /* RCU waiting on outgoing CPU? */ /* Report quiescent state -before- changing ->qsmaskinitnext! */ rcu_disable_urgency_upon_qs(rdp); rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags); raw_spin_lock_irqsave_rcu_node(rnp, flags); } WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask); > >Could you please try this, just so we know what happens in this case? >One way of forcing this would be to do rcu_read_lock() just before the >call to rcu_report_dead(), though other diagnostics might require that >rcu_read_lock() to be earlier in the code. > > >Another question in both cases... There is a more subtle change where the >old code ignores rdp->cpu_no_qs.b.exp (thus invoking rcu_report_exp_rdp() >unconditionally) and the new code avoids invoking rcu_report_exp_rdp() >unless this is set. How does this interact with a new expedited >grace period that starts just as this CPU calls rcu_report_dead()? 1.When a new expedited grace period that starts just as this CPU call rcu_report_dead(), if in this time, this CPU rcu_data structure's cpu_no_qs.b.exp is not set true, the rcu_preempt_deferred_qs() will not call rcu_report_exp_rdp(). but when call rcu_report_dead(), this CPU have been offline(cpu_is_offline(this CPU) return true). 2.In __sync_rcu_exp_select_node_cpus(), invoke smp_call_function_single() for this CPU will return -ENXIO, and then check (rnp->qsmaskinitnext & mask) and (rnp->expmask & mask) 3.If in this time, the rcu_report_dead() has not yet clear CPU mask from rnp->qsmaskinitnext, we will redo 2 step, recall smp_call_function_single(), but is always return -ENXIO, and recheck (rnp->qsmaskinitnext & mask) and (rnp->expmask & mask), until rcu_report_dead() clear CPU mask from rnp->qsmaskinitnext. Therefore, the __sync_rcu_exp_select_node_cpus() will call rcu_report_exp_cpu_mult() to report this offline CPU's exp QS Thanks Zqiang >The expedited grace-period code in __sync_rcu_exp_select_node_cpus() >is of special concern here. > > Thanx, Paul > --- > kernel/rcu/tree.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 6bb8e72bc815..0ca21ac0f064 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -4276,8 +4276,6 @@ void rcu_report_dead(unsigned int cpu) > // Do any dangling deferred wakeups. > do_nocb_deferred_wakeup(rdp); > > - /* QS for any half-done expedited grace period. */ > - rcu_report_exp_rdp(rdp); > rcu_preempt_deferred_qs(current); > > /* Remove outgoing CPU from mask in the leaf rcu_node structure. */ > -- > 2.25.1 >