On Fri, Nov 01, 2019 at 02:19:13AM +0800, Lai Jiangshan wrote: > > > On 2019/10/31 9:43 下午, Paul E. McKenney wrote: > > On Thu, Oct 31, 2019 at 10:07:56AM +0000, Lai Jiangshan wrote: > > > If exp_deferred_qs is incorrectly set and leaked to the next > > > exp GP, it may cause the next GP to be incorrectly prematurely > > > completed. > > > > Could you please provide the sequence of events leading to a such a > > failure? > > I just felt nervous with "leaking" exp_deferred_qs. > I didn't careful consider the sequence of events. > > Now it proves that I must have misunderstood the exp_deferred_qs. > So call "leaking" is wrong concept, preempt_disable() > is considered as rcu_read_lock() and exp_deferred_qs > needs to be set. Thank you for checking, and yes, this code is a bit subtle. So good on you for digging into it! Thanx, Paul > Thanks > Lai > > ============don't need to read: > > read_read_lock() > // other cpu start exp GP_A > preempt_schedule() // queue itself > read_read_unlock() //report qs, other cpu is sending ipi to me > preempt_disable > rcu_exp_handler() interrupt for GP_A and leave a exp_deferred_qs > // exp GP_A finished > ---------------above is one possible way to leave a exp_deferred_qs > preempt_enable() > interrupt before preempt_schedule() > read_read_lock() > read_read_unlock() > NESTED interrupt when nagative rcu_read_lock_nesting > read_read_lock() > // other cpu start exp GP_B > NESTED interrupt for rcu_flavor_sched_clock_irq() > report exq qs since rcu_read_lock_nesting <0 and \ > exp_deferred_qs is true > // exp GP_B complete > read_read_unlock() > > This plausible sequence relies on NESTED interrupt too, > and can be avoided by patch2 if NESTED interrupt were allowed. > > > > > Also, did you provoke such a failure in testing? If so, an upgrade > > to rcutorture would be good, so please tell me what you did to make > > the failure happen. > > > > I do like the reduction in state space, but I am a bit concerned about > > the potential increase in contention on rnp->lock. Thoughts? > > > > Thanx, Paul > > > > > Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx> > > > --- > > > kernel/rcu/tree_exp.h | 23 ++++++++++++++--------- > > > 1 file changed, 14 insertions(+), 9 deletions(-) > > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > > > index a0e1e51c51c2..6dec21909b30 100644 > > > --- a/kernel/rcu/tree_exp.h > > > +++ b/kernel/rcu/tree_exp.h > > > @@ -603,6 +603,18 @@ static void rcu_exp_handler(void *unused) > > > struct rcu_node *rnp = rdp->mynode; > > > struct task_struct *t = current; > > > + /* > > > + * Note that there is a large group of race conditions that > > > + * can have caused this quiescent state to already have been > > > + * reported, so we really do need to check ->expmask first. > > > + */ > > > + raw_spin_lock_irqsave_rcu_node(rnp, flags); > > > + if (!(rnp->expmask & rdp->grpmask)) { > > > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > > > + return; > > > + } > > > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > > > + > > > /* > > > * First, the common case of not being in an RCU read-side > > > * critical section. If also enabled or idle, immediately > > > @@ -628,17 +640,10 @@ static void rcu_exp_handler(void *unused) > > > * a future context switch. Either way, if the expedited > > > * grace period is still waiting on this CPU, set ->deferred_qs > > > * so that the eventual quiescent state will be reported. > > > - * Note that there is a large group of race conditions that > > > - * can have caused this quiescent state to already have been > > > - * reported, so we really do need to check ->expmask. > > > */ > > > if (t->rcu_read_lock_nesting > 0) { > > > - raw_spin_lock_irqsave_rcu_node(rnp, flags); > > > - if (rnp->expmask & rdp->grpmask) { > > > - rdp->exp_deferred_qs = true; > > > - t->rcu_read_unlock_special.b.exp_hint = true; > > > - } > > > - raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > > > + rdp->exp_deferred_qs = true; > > > + WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true); > > > return; > > > } > > > -- > > > 2.20.1 > > >