On Tue, Sep 20, 2022 at 11:45:45AM +0800, Pingfan Liu wrote: > On Mon, Sep 19, 2022 at 12:51:49PM +0200, Frederic Weisbecker wrote: > > On Mon, Sep 19, 2022 at 06:24:21PM +0800, Pingfan Liu wrote: > > > On Fri, Sep 16, 2022 at 10:52 PM Frederic Weisbecker > > > <frederic@xxxxxxxxxx> wrote: > > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > > > > index 438ecae6bd7e..ef6d3ae239b9 100644 > > > > > --- a/kernel/rcu/tree_plugin.h > > > > > +++ b/kernel/rcu/tree_plugin.h > > > > > @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) > > > > > static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) > > > > > { > > > > > struct task_struct *t = rnp->boost_kthread_task; > > > > > - unsigned long mask = rcu_rnp_online_cpus(rnp); > > > > > + unsigned long mask; > > > > > cpumask_var_t cm; > > > > > int cpu; > > > > > > > > > > @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) > > > > > if (!zalloc_cpumask_var(&cm, GFP_KERNEL)) > > > > > return; > > > > > mutex_lock(&rnp->boost_kthread_mutex); > > > > > + /* > > > > > + * Relying on the lock to serialize, so when onlining, the latest > > > > > + * qsmaskinitnext is for cpus_ptr. > > > > > + */ > > > > > + mask = rcu_rnp_online_cpus(rnp); > > > > > for_each_leaf_node_possible_cpu(rnp, cpu) > > > > > if ((mask & leaf_node_cpu_bit(rnp, cpu)) && > > > > > cpu != outgoingcpu) > > > > > > > > Right but you still race against concurrent rcu_report_dead() doing: > > > > > > > > WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask) > > > > > > > > > > Ah. Indeed, my commit log is not precisely described. > > > > > > In fact, either speedup smp_init [1] or fast kexec reboot [2] still > > > uses the model: one hotplug initiator and several reactors. The > > > initiators are still excluded from each other by the pair > > > cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a > > > cpu-up and cpu-down event at the same time. > > > > Yes but two downing CPUs may race right? > > > > So isn't it possible to have rcu_report_dead() racing against > > rcutree_offline_cpu()? > > > > Yes, two downing cpus can be in a batch. > > There is a nuance between onlining and offlining: > -1. onlining relies on qsmaskinitnext, which is stable at the point > rcutree_online_cpu(). And it is what this patch aims for. > -2. offlining relies on cpu_dying_mask [2/3], instead of qsmaskinitnext > which is not cleared yet at the point rcutree_offline_cpu(). > > It is asymmetric owning to the point of updating qsmaskinitnext. > > Now considering the racing between rcu_report_dead() and > rcutree_offline_cpu(): > > No matter rcutree_offline_cpu() reads out either stale or fresh value of > qsmaskinitnext, it should intersect with cpu_dying_mask. Ah I see now, so the race can happen but it is then cancelled by the stable READ on cpu_dying_mask. Ok. > > Thanks, > > Pingfan