Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Sun, 2 Oct 2022 11:52:27 -0400

On 10/2/2022 8:34 AM, Pingfan Liu wrote:
> On Sat, Oct 1, 2022 at 10:26 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>> On 9/20/22 05:23, Frederic Weisbecker wrote:
>>> On Tue, Sep 20, 2022 at 11:45:45AM +0800, Pingfan Liu wrote:
[...]
>>>>>> <frederic@xxxxxxxxxx> wrote:
>>>>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>>>>>>> index 438ecae6bd7e..ef6d3ae239b9 100644
>>>>>>>> --- a/kernel/rcu/tree_plugin.h
>>>>>>>> +++ b/kernel/rcu/tree_plugin.h
>>>>>>>> @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
>>>>>>>>  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>>>>>>>>  {
>>>>>>>>       struct task_struct *t = rnp->boost_kthread_task;
>>>>>>>> -     unsigned long mask = rcu_rnp_online_cpus(rnp);
>>>>>>>> +     unsigned long mask;
>>>>>>>>       cpumask_var_t cm;
>>>>>>>>       int cpu;
>>>>>>>>
>>>>>>>> @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>>>>>>>>       if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
>>>>>>>>               return;
>>>>>>>>       mutex_lock(&rnp->boost_kthread_mutex);
>>>>>>>> +     /*
>>>>>>>> +      * Relying on the lock to serialize, so when onlining, the latest
>>>>>>>> +      * qsmaskinitnext is for cpus_ptr.
>>>>>>>> +      */
>>>>>>>> +     mask = rcu_rnp_online_cpus(rnp);
>>>>>>>>       for_each_leaf_node_possible_cpu(rnp, cpu)
>>>>>>>>               if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
>>>>>>>>                   cpu != outgoingcpu)
>>>>>>>
>>>>>>> Right but you still race against concurrent rcu_report_dead() doing:
>>>>>>>
>>>>>>>       WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask)
>>>>>>>
>>>>>>
>>>>>> Ah. Indeed, my commit log is not precisely described.
>>>>>>
>>>>>> In fact, either speedup smp_init [1] or fast kexec reboot [2] still
>>>>>> uses the model: one hotplug initiator and several reactors.  The
>>>>>> initiators are still excluded from each other by the pair
>>>>>> cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a
>>>>>> cpu-up and cpu-down event at the same time.
>>>>>
>>>>> Yes but two downing CPUs may race right?
>>>>>
>>>>> So isn't it possible to have rcu_report_dead() racing against
>>>>> rcutree_offline_cpu()?
>>>>>
>>>>
>>>> Yes, two downing cpus can be in a batch.
>>>>
>>>> There is a nuance between onlining and offlining:
>>>> -1. onlining relies on qsmaskinitnext, which is stable at the point
>>>> rcutree_online_cpu(). And it is what this patch aims for.
>>>> -2. offlining relies on cpu_dying_mask [2/3], instead of qsmaskinitnext
>>>> which is not cleared yet at the point rcutree_offline_cpu().
>>>>
>>>> It is asymmetric owning to the point of updating qsmaskinitnext.
>>>>
>>>> Now considering the racing between rcu_report_dead() and
>>>> rcutree_offline_cpu():
>>>>
>>>> No matter rcutree_offline_cpu() reads out either stale or fresh value of
>>>> qsmaskinitnext, it should intersect with cpu_dying_mask.
>>>
>>> Ah I see now, so the race can happen but it is then cancelled by the
>>> stable READ on cpu_dying_mask. Ok.
>>>
>>
>> Maybe I am missing something but, concurrent rcu_report_dead() on CPU's
>> of the same node can corrupt ->qsmaskinitnext as Frederick showed. That
> 
> There is a pair of lock ops around WRITE_ONCE(rnp->qsmaskinitnext,
> rnp->qsmaskinitnext & ~mask), i.e. raw_spin_lock_irqsave_rcu_node(rnp,
> flags) and raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> 
> So it should not be a problem, right?

Correct, I think should not be problem. I got thrown away a bit by the comment
"Right but you still race against concurrent rcu_report_dead()".

Even with concurrent offlining, a stable ->qsmaskinitnext should ensure that the
hotplugged CPUs are properly considered in rcu_gp_init(), so I don't see a
problem with your patch so far... but I'll continue to dig when you send the
next patch. My fear was if a bit in ->qsmaskinitnext is not cleared during
offlining race, then grace period will hang.

Thanks,

 - Joel