On Wed, Jul 21, 2021 at 12:19:02AM +0800, Zhouyi Zhou wrote: > On Tue, Jul 20, 2021 at 10:16 PM Marco Elver <elver@xxxxxxxxxx> wrote: > > > > On Tue, 20 Jul 2021 at 15:18, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > [...] > > > Good catch! And yes, this would be hard to reproduce. > > > > > > How about as shown below? > > > > Acked-by: Marco Elver <elver@xxxxxxxxxx> > > > > I was merely a little surprised syzbot was able to exercise RCU in a > > way that resulted in a data race your torture runs hadn't found yet > > (or perhaps it did and missed?). > > I think rcu_state.n_force_qs is used to give grace period a kick. In worst > case, the data race will cause the grace period to miss at most one > kick, but the grace > period will get kicked soon because of too many callbacks, I guess that's > why rcu torture will not find this data race. In theory, rcutorture -could- find it, but it would require a call_rcu() or similar just at the beginning of the grace period. This is a tough target to hit. Thanx, Paul > Many thanks > Zhouyi > > > > Thanks, > > -- Marco > > > > > Thanx, Paul > > > > > > ------------------------------------------------------------------------ > > > > > > commit 43e0f01f3b6f510dbe31d02a8f4c909c45deff04 > > > Author: Paul E. McKenney <paulmck@xxxxxxxxxx> > > > Date: Tue Jul 20 06:16:27 2021 -0700 > > > > > > rcu: Mark accesses to rcu_state.n_force_qs > > > > > > This commit marks accesses to the rcu_state.n_force_qs. These data > > > races are hard to make happen, but syzkaller was equal to the task. > > > > > > Reported-by: syzbot+e08a83a1940ec3846cd5@xxxxxxxxxxxxxxxxxxxxxxxxx > > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index a7379c44a2366..245bca7cdf6ee 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -1913,7 +1913,7 @@ static void rcu_gp_fqs(bool first_time) > > > struct rcu_node *rnp = rcu_get_root(); > > > > > > WRITE_ONCE(rcu_state.gp_activity, jiffies); > > > - rcu_state.n_force_qs++; > > > + WRITE_ONCE(rcu_state.n_force_qs, rcu_state.n_force_qs + 1); > > > if (first_time) { > > > /* Collect dyntick-idle snapshots. */ > > > force_qs_rnp(dyntick_save_progress_counter); > > > @@ -2556,7 +2556,7 @@ static void rcu_do_batch(struct rcu_data *rdp) > > > /* Reset ->qlen_last_fqs_check trigger if enough CBs have drained. */ > > > if (count == 0 && rdp->qlen_last_fqs_check != 0) { > > > rdp->qlen_last_fqs_check = 0; > > > - rdp->n_force_qs_snap = rcu_state.n_force_qs; > > > + rdp->n_force_qs_snap = READ_ONCE(rcu_state.n_force_qs); > > > } else if (count < rdp->qlen_last_fqs_check - qhimark) > > > rdp->qlen_last_fqs_check = count; > > > > > > @@ -2904,10 +2904,10 @@ static void __call_rcu_core(struct rcu_data *rdp, struct rcu_head *head, > > > } else { > > > /* Give the grace period a kick. */ > > > rdp->blimit = DEFAULT_MAX_RCU_BLIMIT; > > > - if (rcu_state.n_force_qs == rdp->n_force_qs_snap && > > > + if (READ_ONCE(rcu_state.n_force_qs) == rdp->n_force_qs_snap && > > > rcu_segcblist_first_pend_cb(&rdp->cblist) != head) > > > rcu_force_quiescent_state(); > > > - rdp->n_force_qs_snap = rcu_state.n_force_qs; > > > + rdp->n_force_qs_snap = READ_ONCE(rcu_state.n_force_qs); > > > rdp->qlen_last_fqs_check = rcu_segcblist_n_cbs(&rdp->cblist); > > > } > > > } > > > @@ -4134,7 +4134,7 @@ int rcutree_prepare_cpu(unsigned int cpu) > > > /* Set up local state, ensuring consistent view of global state. */ > > > raw_spin_lock_irqsave_rcu_node(rnp, flags); > > > rdp->qlen_last_fqs_check = 0; > > > - rdp->n_force_qs_snap = rcu_state.n_force_qs; > > > + rdp->n_force_qs_snap = READ_ONCE(rcu_state.n_force_qs); > > > rdp->blimit = blimit; > > > rdp->dynticks_nesting = 1; /* CPU not up, no tearing. */ > > > rcu_dynticks_eqs_online();