Re: [PATCH] rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Fri, 6 May 2022 11:59:55 -0700

On Fri, May 06, 2022 at 12:43:35PM +0000, Zhang, Qiang1 wrote:
> 
> On Fri, May 06, 2022 at 12:40:09AM +0000, Zhang, Qiang1 wrote:
> > On Thu, May 05, 2022 at 11:52:36PM +0800, Zqiang wrote:
> > > Currently, the rnp's cbovlmask is set in call_rcu(). when CPU going 
> > > offline, the outgoing CPU's callbacks is migrated to target CPU, the 
> > > number of callbacks on the my_rdp may be overloaded, if overload and 
> > > there is no call_rcu() call on target CPU for a long time, the rnp's 
> > > cbovldmask is not set in time. in order to fix this situation, add
> > > check_cb_ovld_locked() in rcutree_migrate_callbacks() to help CPU 
> > > more quickly reach quiescent states.
> > > 
> > > Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
> > 
> > >Doesn't this get set right at the end of the current grace period?
> > >Given that there is a callback overload, there should be a grace period in progress.
> > >
> > >See this code in rcu_gp_cleanup():
> > >
> > >		if (rcu_is_leaf_node(rnp))
> > >			for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
> > >				rdp = per_cpu_ptr(&rcu_data, cpu);
> > >				check_cb_ovld_locked(rdp, rnp);
> > >			}
> > >
> > >So what am I missing here?  Or are you planning to remove the above code?
> > 
> > We only checked the overloaded rdp at the end of current grace period, 
> > for my_rdp overloaded cause by migration callbacks to it,  if the 
> > my_rdp overloaded, and the my_rdp->mynode 's cbovldmask  is empty,  
> > the my_rdp overloaded may be not checked at end of the current grace period.
> >
> >Very good!
> >
> > I have another question, why don't we call check_cb_ovld_locked() when rdp's n_cbs decreases.
> > for example call check_cb_ovld_locked() in rcu_do_bacth(), not at the end of grace period.
> 
> >The idea (as you noted above) is that it gets cleared at the end of each grace period.  We could also clear it in rcu_do_batch() as you suggest, but to make that change you would need to convince me that the extra overhead and complexity would provide a useful benefit.  This will not be easy.  ;-)
> 
> > >If so, wouldn't you also need to clear the indication for the CPU that is going offline, being careful to handle the case where the two CPUs have different leaf rcu_node structures?
> > 
> > Yes the offline CPU need to clear.
> >
> >But again, the clearing happens at the end of the next grace period.
> >Here we lose (almost) nothing by leaving the bit set because the other bit is set as well.
> >
> >Another question, as long as we brought up rcu_do_batch().
> >
> >Why have the local variable "empty" given that the local variable "count"
> >could be checked against zero?
> 
> Thanks for reminding
> I noticed  when RCU_NOCB_CPU and DEBUG_OBJECTS_RCU_HEAD is not enable . 
> double call call_rcu() will cause the rdp->cblist's len increase, but
> actually, the number of objects in the rdp->cblist has not changed.  the
> WARN_ON_ONCE(!IS_ENABLED(CONFIG_RCU_NOCB_CPU) && count != 0 && empty)
> will be triggered.

In this case, the system is probably dead anyway due to the callback being
reused.  But good point, this is a case where the counts can diverge.

Let this be a lesson to you.  Never invoke call_rcu() on an rcu_head
structure that is already queued waiting for a grace period to elapse.  ;-)

> When RCU_NOCB_CPU is enabled, even without double call call_rcu().  due to nocb  bypass
> Some objects may be in the rdp->nocb_bypass list, this causes the count to be non-zero 
> when the rdp->cblist list is empty.

Exactly!  Very good!!!

							Thanx, Paul

> >In the meantime, I have queued your commit for v5.20, thank you and good eyes!  As always, I could not resist the urge to wordsmith the commit log, so could you please check it for errors?
> 
> Thank you very much.
> 
> >							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 5c36f04bd460246dd28c178ce5dce6fb02f898e1
> Author: Zqiang <qiang1.zhang@xxxxxxxxx>
> Date:   Thu May 5 23:52:36 2022 +0800
> 
>     rcu: Add rnp->cbovldmask check in rcutree_migrate_callbacks()
>     
>     Currently, the rcu_node structure's ->cbovlmask field is set in call_rcu()
>     when a given CPU is suffering from callback overload.  But if that CPU
>     goes offline, the outgoing CPU's callbacks is migrated to the running
>     CPU, which is likely to overload the running CPU.  However, that CPU's
>     bit in its leaf rcu_node structure's ->cbovlmask field remains zero.
>     
>     Initially, this is OK because the outgoing CPU's bit remains set.
>     However, that bit will be cleared at the next end of a grace period,
>     at which time it is quite possible that the running CPU will still
>     be overloaded.  If the running CPU invokes call_rcu(), then overload
>     will be checked for and the bit will be set.  Except that there is no
>     guarantee that the running CPU will invoke call_rcu(), in which case the
>     next grace period will fail to take the running CPU's overload condition
>     into account.  Plus, because the bit is not set, the end of the grace
>     period won't check for overload on this CPU.
>     
>     This commit therefore adds a call to check_cb_ovld_locked() in
>     check_cb_ovld_locked() to set the running CPU's ->cbovlmask bit
>     appropriately.
>     
>     Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
>     Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9dc4c4e82db62..bcc5876c9753b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4577,6 +4577,7 @@ void rcutree_migrate_callbacks(int cpu)
>  	needwake = needwake || rcu_advance_cbs(my_rnp, my_rdp);
>  	rcu_segcblist_disable(&rdp->cblist);
>  	WARN_ON_ONCE(rcu_segcblist_empty(&my_rdp->cblist) != !rcu_segcblist_n_cbs(&my_rdp->cblist));
> +	check_cb_ovld_locked(my_rdp, my_rnp);
>  	if (rcu_rdp_is_offloaded(my_rdp)) {
>  		raw_spin_unlock_rcu_node(my_rnp); /* irqs remain disabled. */
>  		__call_rcu_nocb_wake(my_rdp, true, flags);