Re: [PATCHv2 2/3] rcu: Resort to cpu_dying_mask for affinity when offlining

Frederic Weisbecker <frederic@xxxxxxxxxx> · Mon, 19 Sep 2022 12:34:32 +0200

On Mon, Sep 19, 2022 at 12:33:23PM +0800, Pingfan Liu wrote:
> On Fri, Sep 16, 2022 at 10:24 PM Frederic Weisbecker
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index ef6d3ae239b9..e5afc63bd97f 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
> > >                   cpu != outgoingcpu)
> > >                       cpumask_set_cpu(cpu, cm);
> > >       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
> > > +     /*
> > > +      * For concurrent offlining, bit of qsmaskinitnext is not cleared yet.
> > > +      * So resort to cpu_dying_mask, whose changes has already been visible.
> > > +      */
> > > +     if (outgoingcpu != -1)
> > > +             cpumask_andnot(cm, cm, cpu_dying_mask);
> >
> > I'm not sure how the infrastructure changes in your concurrent down patchset
> > but can the cpu_dying_mask concurrently change at this stage?
> >
> 
> For the concurrent down patchset [1], it extends the cpu_down()
> capability to let an initiator to tear down several cpus in a batch
> and in parallel.
> 
> At the first step, all cpus to be torn down should experience
> cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU), by that way, they are
> set in the bitmap cpu_dying_mask [2]. Then the cpu hotplug kthread on
> each teardown cpu can be kicked to work. (Indeed, [2] has a bug, and I
> need to fix it by using another loop to call
> cpuhp_kick_ap_work_async(cpu);)

So if I understand correctly there is a synchronization point for all
CPUs between cpuhp_set_state() and CPUHP_AP_RCUTREE_ONLINE ?

And how about rollbacks through cpuhp_reset_state() ?

Thanks.