On Tue, Sep 20, 2022 at 11:16:09AM +0800, Pingfan Liu wrote: > On Mon, Sep 19, 2022 at 12:34:32PM +0200, Frederic Weisbecker wrote: > > On Mon, Sep 19, 2022 at 12:33:23PM +0800, Pingfan Liu wrote: > > > On Fri, Sep 16, 2022 at 10:24 PM Frederic Weisbecker > > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > > > > index ef6d3ae239b9..e5afc63bd97f 100644 > > > > > --- a/kernel/rcu/tree_plugin.h > > > > > +++ b/kernel/rcu/tree_plugin.h > > > > > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) > > > > > cpu != outgoingcpu) > > > > > cpumask_set_cpu(cpu, cm); > > > > > cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU)); > > > > > + /* > > > > > + * For concurrent offlining, bit of qsmaskinitnext is not cleared yet. > > > > > + * So resort to cpu_dying_mask, whose changes has already been visible. > > > > > + */ > > > > > + if (outgoingcpu != -1) > > > > > + cpumask_andnot(cm, cm, cpu_dying_mask); > > > > > > > > I'm not sure how the infrastructure changes in your concurrent down patchset > > > > but can the cpu_dying_mask concurrently change at this stage? > > > > > > > > > > For the concurrent down patchset [1], it extends the cpu_down() > > > capability to let an initiator to tear down several cpus in a batch > > > and in parallel. > > > > > > At the first step, all cpus to be torn down should experience > > > cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU), by that way, they are > > > set in the bitmap cpu_dying_mask [2]. Then the cpu hotplug kthread on > > > each teardown cpu can be kicked to work. (Indeed, [2] has a bug, and I > > > need to fix it by using another loop to call > > > cpuhp_kick_ap_work_async(cpu);) > > > > So if I understand correctly there is a synchronization point for all > > CPUs between cpuhp_set_state() and CPUHP_AP_RCUTREE_ONLINE ? > > > > Yes, your understanding is right. > > > And how about rollbacks through cpuhp_reset_state() ? > > > > Originally, cpuhp_reset_state() is not considered in my fast kexec > reboot series since at that point, all devices have been shutdown and > have no way to back. The rebooting just adventures to move on. > > But yes as you point out, cpuhp_reset_state() throws a challenge to keep > the stability of cpu_dying_mask. > > Considering we have the following order. > 1. > set_cpu_dying(true) > rcutree_offline_cpu() > 2. when rollback > set_cpu_dying(false) > rcutree_online_cpu() > > > The dying mask is stable before rcu routines, and > rnp->boost_kthread_mutex can be used to build a order to access the > latest cpu_dying_mask as in [1/3]. Ok thanks for the clarification!