On 10/2/2022 8:34 AM, Pingfan Liu wrote: > On Sat, Oct 1, 2022 at 10:26 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: >> On 9/20/22 05:23, Frederic Weisbecker wrote: >>> On Tue, Sep 20, 2022 at 11:45:45AM +0800, Pingfan Liu wrote: [...] >>>>>> <frederic@xxxxxxxxxx> wrote: >>>>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h >>>>>>>> index 438ecae6bd7e..ef6d3ae239b9 100644 >>>>>>>> --- a/kernel/rcu/tree_plugin.h >>>>>>>> +++ b/kernel/rcu/tree_plugin.h >>>>>>>> @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp) >>>>>>>> static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) >>>>>>>> { >>>>>>>> struct task_struct *t = rnp->boost_kthread_task; >>>>>>>> - unsigned long mask = rcu_rnp_online_cpus(rnp); >>>>>>>> + unsigned long mask; >>>>>>>> cpumask_var_t cm; >>>>>>>> int cpu; >>>>>>>> >>>>>>>> @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) >>>>>>>> if (!zalloc_cpumask_var(&cm, GFP_KERNEL)) >>>>>>>> return; >>>>>>>> mutex_lock(&rnp->boost_kthread_mutex); >>>>>>>> + /* >>>>>>>> + * Relying on the lock to serialize, so when onlining, the latest >>>>>>>> + * qsmaskinitnext is for cpus_ptr. >>>>>>>> + */ >>>>>>>> + mask = rcu_rnp_online_cpus(rnp); >>>>>>>> for_each_leaf_node_possible_cpu(rnp, cpu) >>>>>>>> if ((mask & leaf_node_cpu_bit(rnp, cpu)) && >>>>>>>> cpu != outgoingcpu) >>>>>>> >>>>>>> Right but you still race against concurrent rcu_report_dead() doing: >>>>>>> >>>>>>> WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask) >>>>>>> >>>>>> >>>>>> Ah. Indeed, my commit log is not precisely described. >>>>>> >>>>>> In fact, either speedup smp_init [1] or fast kexec reboot [2] still >>>>>> uses the model: one hotplug initiator and several reactors. The >>>>>> initiators are still excluded from each other by the pair >>>>>> cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a >>>>>> cpu-up and cpu-down event at the same time. >>>>> >>>>> Yes but two downing CPUs may race right? >>>>> >>>>> So isn't it possible to have rcu_report_dead() racing against >>>>> rcutree_offline_cpu()? >>>>> >>>> >>>> Yes, two downing cpus can be in a batch. >>>> >>>> There is a nuance between onlining and offlining: >>>> -1. onlining relies on qsmaskinitnext, which is stable at the point >>>> rcutree_online_cpu(). And it is what this patch aims for. >>>> -2. offlining relies on cpu_dying_mask [2/3], instead of qsmaskinitnext >>>> which is not cleared yet at the point rcutree_offline_cpu(). >>>> >>>> It is asymmetric owning to the point of updating qsmaskinitnext. >>>> >>>> Now considering the racing between rcu_report_dead() and >>>> rcutree_offline_cpu(): >>>> >>>> No matter rcutree_offline_cpu() reads out either stale or fresh value of >>>> qsmaskinitnext, it should intersect with cpu_dying_mask. >>> >>> Ah I see now, so the race can happen but it is then cancelled by the >>> stable READ on cpu_dying_mask. Ok. >>> >> >> Maybe I am missing something but, concurrent rcu_report_dead() on CPU's >> of the same node can corrupt ->qsmaskinitnext as Frederick showed. That > > There is a pair of lock ops around WRITE_ONCE(rnp->qsmaskinitnext, > rnp->qsmaskinitnext & ~mask), i.e. raw_spin_lock_irqsave_rcu_node(rnp, > flags) and raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > > So it should not be a problem, right? Correct, I think should not be problem. I got thrown away a bit by the comment "Right but you still race against concurrent rcu_report_dead()". Even with concurrent offlining, a stable ->qsmaskinitnext should ensure that the hotplugged CPUs are properly considered in rcu_gp_init(), so I don't see a problem with your patch so far... but I'll continue to dig when you send the next patch. My fear was if a bit in ->qsmaskinitnext is not cleared during offlining race, then grace period will hang. Thanks, - Joel