On Fri, Sep 16, 2022 at 10:24 PM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote: > > On Thu, Sep 15, 2022 at 01:58:24PM +0800, Pingfan Liu wrote: > > During offlining, the concurrent rcutree_offline_cpu() can not be aware > > of each other through ->qsmaskinitnext. But cpu_dying_mask carries such > > information at that point and can be utilized. > > > > Besides, a trivial change which removes the redudant call to > > rcu_boost_kthread_setaffinity() in rcutree_dead_cpu() since > > rcutree_offline_cpu() can fully serve that purpose. > > > > Signed-off-by: Pingfan Liu <kernelfans@xxxxxxxxx> > > Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx> > > Cc: David Woodhouse <dwmw@xxxxxxxxxxxx> > > Cc: Frederic Weisbecker <frederic@xxxxxxxxxx> > > Cc: Neeraj Upadhyay <quic_neeraju@xxxxxxxxxxx> > > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > > Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx> > > Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> > > Cc: "Jason A. Donenfeld" <Jason@xxxxxxxxx> > > To: rcu@xxxxxxxxxxxxxxx > > --- > > kernel/rcu/tree.c | 2 -- > > kernel/rcu/tree_plugin.h | 6 ++++++ > > 2 files changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 79aea7df4345..8a829b64f5b2 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -2169,8 +2169,6 @@ int rcutree_dead_cpu(unsigned int cpu) > > return 0; > > > > WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus - 1); > > - /* Adjust any no-longer-needed kthreads. */ > > - rcu_boost_kthread_setaffinity(rnp, -1); > > // Stop-machine done, so allow nohz_full to disable tick. > > tick_dep_clear(TICK_DEP_BIT_RCU); > > return 0; > > I would suggest to make this a separate change, for bisectability and > readability. > OK, I will. > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > index ef6d3ae239b9..e5afc63bd97f 100644 > > --- a/kernel/rcu/tree_plugin.h > > +++ b/kernel/rcu/tree_plugin.h > > @@ -1243,6 +1243,12 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu) > > cpu != outgoingcpu) > > cpumask_set_cpu(cpu, cm); > > cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU)); > > + /* > > + * For concurrent offlining, bit of qsmaskinitnext is not cleared yet. > > + * So resort to cpu_dying_mask, whose changes has already been visible. > > + */ > > + if (outgoingcpu != -1) > > + cpumask_andnot(cm, cm, cpu_dying_mask); > > I'm not sure how the infrastructure changes in your concurrent down patchset > but can the cpu_dying_mask concurrently change at this stage? > For the concurrent down patchset [1], it extends the cpu_down() capability to let an initiator to tear down several cpus in a batch and in parallel. At the first step, all cpus to be torn down should experience cpuhp_set_state(cpu, st, CPUHP_TEARDOWN_CPU), by that way, they are set in the bitmap cpu_dying_mask [2]. Then the cpu hotplug kthread on each teardown cpu can be kicked to work. (Indeed, [2] has a bug, and I need to fix it by using another loop to call cpuhp_kick_ap_work_async(cpu);) At the outmost, the pair cpu_maps_update_begin()/cpu_maps_update_done() still prevent any new initiator from launching another concurrent hot-add/remove event. So cpu_dying_mask can be stable during the batched and concurrent cpus' teardown. [1]: https://lore.kernel.org/all/20220822021520.6996-1-kernelfans@xxxxxxxxx/ [2]: https://lore.kernel.org/all/20220822021520.6996-4-kernelfans@xxxxxxxxx/ Thanks, Pingfan