Re: [PATCH v4] rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Fri, 2 Dec 2022 08:25:13 -0500

> On Dec 2, 2022, at 7:52 AM, Zhang, Qiang1 <qiang1.zhang@xxxxxxxxx> wrote:
> 
> On Thu, Dec 01, 2022 at 07:45:33AM +0800, Zqiang wrote:
>> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
>> RCU-tasks grace period, if __num_online_cpus == 1, will return
>> directly, indicates the end of the rude RCU-task grace period.
>> suppose the system has two cpus, consider the following scenario:
>> 
>>        CPU0                                   CPU1 (going offline)
>>                                          migration/1 task:
>>                                      cpu_stopper_thread
>>                                       -> take_cpu_down
>>                                          -> _cpu_disable
>>                                           (dec __num_online_cpus)
>>                                          ->cpuhp_invoke_callback
>>                                                preempt_disable
>>                                                access old_data0
>>           task1
>> del old_data0                                  .....
>> synchronize_rcu_tasks_rude()
>> task1 schedule out
>> ....
>> task2 schedule in
>> rcu_tasks_rude_wait_gp()
>>     ->__num_online_cpus == 1
>>       ->return
>> ....
>> task1 schedule in
>> ->free old_data0
>>                                                preempt_enable
>> 
>> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
>> the CPU1 has not finished offline, stop_machine task(migration/1)
>> still running on CPU1, maybe still accessing 'old_data0', but the
>> 'old_data0' has freed on CPU0.
>> 
>> In order to prevent the above scenario from happening, this commit
>> remove check for __num_online_cpus == 0 and add handling of calling
>> synchronize_rcu_tasks_generic() during early boot(when the
>> rcu_scheduler_active variable is RCU_SCHEDULER_INACTIVE, the scheduler
>> not yet initialized and only one boot-CPU online).
>> 
>> Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
> 
>> Very good, thank you!  I did the usual wordsmithing, including to that
>> error message, so as usual please check to make sure that I didn't mess
>> something up.
>> 
>>                            Thanx, Paul
>> 
>> ------------------------------------------------------------------------
>> 
>> commit 033ddc5d337984e20b9d49c8af4faa4689727626
>> Author: Zqiang <qiang1.zhang@xxxxxxxxx>
>> Date:   Thu Dec 1 07:45:33 2022 +0800
>> 
>>   rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug
>> 
>>   The synchronize_rcu_tasks_rude() function invokes rcu_tasks_rude_wait_gp()
>>   to wait one rude RCU-tasks grace period.  The rcu_tasks_rude_wait_gp()
>>   function in turn checks if there is only a single online CPU.  If so, it
>>   will immediately return, because a call to synchronize_rcu_tasks_rude()
>>   is by definition a grace period on a single-CPU system.  (We could
>>   have blocked!)
>> 
>>   Unfortunately, this check uses num_online_cpus() without synchronization,
>>   which can result in too-short grace periods.  To see this, consider the
>>   following scenario:
>> 
>>           CPU0                                   CPU1 (going offline)
>>                                             migration/1 task:
>>                                         cpu_stopper_thread
>>                                          -> take_cpu_down
>>                                             -> _cpu_disable
>>                                              (dec __num_online_cpus)
>>                                            ->cpuhp_invoke_callback
>>                                                   preempt_disable
>>                                                   access old_data0
>>              task1
>>    del old_data0                                  .....
>>    synchronize_rcu_tasks_rude()
>>    task1 schedule out
>>    ....
>>    task2 schedule in
>>    rcu_tasks_rude_wait_gp()
>>        ->__num_online_cpus == 1
>>          ->return
>>    ....
>>    task1 schedule in
>>    ->free old_data0
>>                                                   preempt_enable
>> 
>>   When CPU1 decrements __num_online_cpus, its value becomes 1.  However,
>>   CPU1 has not finished going offline, and will take one last trip through
>>   the scheduler and the idle loop before it actually stops executing
>>   instructions.  Because synchronize_rcu_tasks_rude() is mostly used for
>>   tracing, and because both the scheduler and the idle loop can be traced,
>>   this means that CPU0's prematurely ended grace period might disrupt the
>>   tracing on CPU1.  Given that this disruption might include CPU1 executing
>>   instructions in memory that was just now freed (and maybe reallocated),
>>   this is a matter of some concern.
>> 
>>   This commit therefore removes that problematic single-CPU check from the
>>   rcu_tasks_rude_wait_gp() function.  This dispenses with the single-CPU
>>   optimization, but there is no evidence indicating that this optimization
>>   is important.  In addition, synchronize_rcu_tasks_generic() contains a
>>   similar optimization (albeit only for early boot), which also splats.
>>   (As in exactly why are you invoking synchronize_rcu_tasks_rude() so
>>   early in boot, anyway???)
>> 
>>   It is OK for the synchronize_rcu_tasks_rude() function's check to be
>>   unsynchronized because the only times that this check can evaluate to
>>   true is when there is only a single CPU running with preemption
>>   disabled.
>> 
>>   While in the area, this commit also fixes a minor bug in which a
>>   call to synchronize_rcu_tasks_rude() would instead be attributed to
>>   synchronize_rcu_tasks().
>> 
>>   [ paulmck: Add "synchronize_" prefix and "()" suffix. ]
>> 
>>   Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
>>   Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>> 
>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>> index 4dda8e6e5707f..d845723c1af41 100644
>> --- a/kernel/rcu/tasks.h
>> +++ b/kernel/rcu/tasks.h
>> @@ -560,8 +560,9 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>> static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
>> {
>>    /* Complain if the scheduler has not started.  */
>> -    WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
>> -             "synchronize_rcu_tasks called too soon");
>> +    if (WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
>> +             "synchronize_%s() called too soon", rtp->name))
> 
> Thanks Paul,  detailed description and modification 😊.

True statement. Good example for everyone ;-)

 - Joel

> 
> 
>> +        return;
>> 
>>    // If the grace-period kthread is running, use it.
>>    if (READ_ONCE(rtp->kthread_ptr)) {
>> @@ -1064,9 +1065,6 @@ static void rcu_tasks_be_rude(struct work_struct *work)
>> // Wait for one rude RCU-tasks grace period.
>> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
>> {
>> -    if (num_online_cpus() <= 1)
>> -        return;    // Fastpath for only one CPU.
>> -
>>    rtp->n_ipis += cpumask_weight(cpu_online_mask);
>>    schedule_on_each_cpu(rcu_tasks_be_rude);
>> }