On Fri, Sep 16, 2022 at 10:52 PM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote: > > On Thu, Sep 15, 2022 at 01:58:23PM +0800, Pingfan Liu wrote: > > rcutree_online_cpu() is concurrent, so there is the following race > > scene: > > > > CPU 1 CPU2 > > mask_old = rcu_rnp_online_cpus(rnp); > > ... > > > > mask_new = rcu_rnp_online_cpus(rnp); > > ... > > set_cpus_allowed_ptr(t, cm); > > > > set_cpus_allowed_ptr(t, cm); > > > > Consequently, the old mask will overwrite the new mask in the task's cpus_ptr. > > > > Since there is a mutex ->boost_kthread_mutex, using it to build an > > order, then the latest ->qsmaskinitnext will be fetched for updating cpus_ptr. > > > > But for concurrent offlining, ->qsmaskinitnext is not reilable when > > rcutree_offline_cpu(). That is another story and comes in the following > > patch. > > > > Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx> > > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > > Cc: David Woodhouse <dwmw@xxxxxxxxxxxx> > > Cc: Frederic Weisbecker <frederic@xxxxxxxxxx> > > Cc: Neeraj Upadhyay <quic_neeraju@xxxxxxxxxxx> > > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > > Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx> > > Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > > Cc: "Jason A. Donenfeld" <Jason@xxxxxxxxx> > > To: rcu@xxxxxxxxxxxxxxx > > --- > > kernel/rcu/tree_plugin.h | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > index 438ecae6bd7e..ef6d3ae239b9 100644 > > --- a/kernel/rcu/tree_plugin.h > > +++ b/kernel/rcu/tree_plugin.h > > @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) > > static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) > > { > > struct task_struct *t = rnp->boost_kthread_task; > > - unsigned long mask = rcu_rnp_online_cpus(rnp); > > + unsigned long mask; > > cpumask_var_t cm; > > int cpu; > > > > @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) > > if (!zalloc_cpumask_var(&cm, GFP_KERNEL)) > > return; > > mutex_lock(&rnp->boost_kthread_mutex); > > + /* > > + * Relying on the lock to serialize, so when onlining, the latest > > + * qsmaskinitnext is for cpus_ptr. > > + */ > > + mask = rcu_rnp_online_cpus(rnp); > > for_each_leaf_node_possible_cpu(rnp, cpu) > > if ((mask & leaf_node_cpu_bit(rnp, cpu)) && > > cpu != outgoingcpu) > > Right but you still race against concurrent rcu_report_dead() doing: > > WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask) > Ah. Indeed, my commit log is not precisely described. In fact, either speedup smp_init [1] or fast kexec reboot [2] still uses the model: one hotplug initiator and several reactors. The initiators are still excluded from each other by the pair cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a cpu-up and cpu-down event at the same time. Thanks, Pingfan