Re: [PATCHv2 1/3] rcu: Keep qsmaskinitnext fresh when rcutree_online_cpu()

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Fri, 30 Sep 2022 22:26:16 -0400

On 9/20/22 05:23, Frederic Weisbecker wrote:
> On Tue, Sep 20, 2022 at 11:45:45AM +0800, Pingfan Liu wrote:
>> On Mon, Sep 19, 2022 at 12:51:49PM +0200, Frederic Weisbecker wrote:
>>> On Mon, Sep 19, 2022 at 06:24:21PM +0800, Pingfan Liu wrote:
>>>> On Fri, Sep 16, 2022 at 10:52 PM Frederic Weisbecker
>>>> <frederic@xxxxxxxxxx> wrote:
>>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>>>>> index 438ecae6bd7e..ef6d3ae239b9 100644
>>>>>> --- a/kernel/rcu/tree_plugin.h
>>>>>> +++ b/kernel/rcu/tree_plugin.h
>>>>>> @@ -1224,7 +1224,7 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
>>>>>>  static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>>>>>>  {
>>>>>>       struct task_struct *t = rnp->boost_kthread_task;
>>>>>> -     unsigned long mask = rcu_rnp_online_cpus(rnp);
>>>>>> +     unsigned long mask;
>>>>>>       cpumask_var_t cm;
>>>>>>       int cpu;
>>>>>>
>>>>>> @@ -1233,6 +1233,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
>>>>>>       if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
>>>>>>               return;
>>>>>>       mutex_lock(&rnp->boost_kthread_mutex);
>>>>>> +     /*
>>>>>> +      * Relying on the lock to serialize, so when onlining, the latest
>>>>>> +      * qsmaskinitnext is for cpus_ptr.
>>>>>> +      */
>>>>>> +     mask = rcu_rnp_online_cpus(rnp);
>>>>>>       for_each_leaf_node_possible_cpu(rnp, cpu)
>>>>>>               if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
>>>>>>                   cpu != outgoingcpu)
>>>>>
>>>>> Right but you still race against concurrent rcu_report_dead() doing:
>>>>>
>>>>>       WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask)
>>>>>
>>>>
>>>> Ah. Indeed, my commit log is not precisely described.
>>>>
>>>> In fact, either speedup smp_init [1] or fast kexec reboot [2] still
>>>> uses the model: one hotplug initiator and several reactors.  The
>>>> initiators are still excluded from each other by the pair
>>>> cpu_maps_update_begin()/cpu_maps_update_done(). So there can not be a
>>>> cpu-up and cpu-down event at the same time.
>>>
>>> Yes but two downing CPUs may race right?
>>>
>>> So isn't it possible to have rcu_report_dead() racing against
>>> rcutree_offline_cpu()?
>>>
>>
>> Yes, two downing cpus can be in a batch.
>>
>> There is a nuance between onlining and offlining:
>> -1. onlining relies on qsmaskinitnext, which is stable at the point
>> rcutree_online_cpu(). And it is what this patch aims for.
>> -2. offlining relies on cpu_dying_mask [2/3], instead of qsmaskinitnext
>> which is not cleared yet at the point rcutree_offline_cpu().
>>
>> It is asymmetric owning to the point of updating qsmaskinitnext.
>>
>> Now considering the racing between rcu_report_dead() and
>> rcutree_offline_cpu():
>>
>> No matter rcutree_offline_cpu() reads out either stale or fresh value of
>> qsmaskinitnext, it should intersect with cpu_dying_mask.
> 
> Ah I see now, so the race can happen but it is then cancelled by the
> stable READ on cpu_dying_mask. Ok.
> 

Maybe I am missing something but, concurrent rcu_report_dead() on CPU's
of the same node can corrupt ->qsmaskinitnext as Frederick showed. That
may not be a problem for affinities (due to the cpu_dying_mask), but is
that not an issue for RCU operation itself?

i.e. due to the corruption, ->qsmaskinitnext is in an inconsistent state
causing *RCU* issues.

Or is there a reason that that is not an issue?

thanks,

 - Joel