RE: [PATCH v4] rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug

"Zhang, Qiang1" <qiang1.zhang@xxxxxxxxx> · Fri, 2 Dec 2022 12:51:56 +0000

On Thu, Dec 01, 2022 at 07:45:33AM +0800, Zqiang wrote:
> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> RCU-tasks grace period, if __num_online_cpus == 1, will return
> directly, indicates the end of the rude RCU-task grace period.
> suppose the system has two cpus, consider the following scenario:
> 
>         CPU0                                   CPU1 (going offline)
>                                           migration/1 task:
>                                       cpu_stopper_thread
>                                        -> take_cpu_down
>                                           -> _cpu_disable
>                                            (dec __num_online_cpus)
>                                           ->cpuhp_invoke_callback
>                                                 preempt_disable
>                                                 access old_data0
>            task1
>  del old_data0                                  .....
>  synchronize_rcu_tasks_rude()
>  task1 schedule out
>  ....
>  task2 schedule in
>  rcu_tasks_rude_wait_gp()
>      ->__num_online_cpus == 1
>        ->return
>  ....
>  task1 schedule in
>  ->free old_data0
>                                                 preempt_enable
> 
> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> the CPU1 has not finished offline, stop_machine task(migration/1)
> still running on CPU1, maybe still accessing 'old_data0', but the
> 'old_data0' has freed on CPU0.
> 
> In order to prevent the above scenario from happening, this commit
> remove check for __num_online_cpus == 0 and add handling of calling
> synchronize_rcu_tasks_generic() during early boot(when the
> rcu_scheduler_active variable is RCU_SCHEDULER_INACTIVE, the scheduler
> not yet initialized and only one boot-CPU online).
> 
> Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>

>Very good, thank you!  I did the usual wordsmithing, including to that
>error message, so as usual please check to make sure that I didn't mess
>something up.
>
>							Thanx, Paul
>
>------------------------------------------------------------------------
>
>commit 033ddc5d337984e20b9d49c8af4faa4689727626
>Author: Zqiang <qiang1.zhang@xxxxxxxxx>
>Date:   Thu Dec 1 07:45:33 2022 +0800
>
>    rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug
>    
>    The synchronize_rcu_tasks_rude() function invokes rcu_tasks_rude_wait_gp()
>    to wait one rude RCU-tasks grace period.  The rcu_tasks_rude_wait_gp()
>    function in turn checks if there is only a single online CPU.  If so, it
>    will immediately return, because a call to synchronize_rcu_tasks_rude()
>    is by definition a grace period on a single-CPU system.  (We could
>    have blocked!)
>    
>    Unfortunately, this check uses num_online_cpus() without synchronization,
>    which can result in too-short grace periods.  To see this, consider the
>    following scenario:
>    
>            CPU0                                   CPU1 (going offline)
>                                              migration/1 task:
>                                          cpu_stopper_thread
>                                           -> take_cpu_down
>                                              -> _cpu_disable
>                                               (dec __num_online_cpus)
>                                             ->cpuhp_invoke_callback
>                                                    preempt_disable
>                                                    access old_data0
>               task1
>     del old_data0                                  .....
>     synchronize_rcu_tasks_rude()
>     task1 schedule out
>     ....
>     task2 schedule in
>     rcu_tasks_rude_wait_gp()
>         ->__num_online_cpus == 1
>           ->return
>     ....
>     task1 schedule in
>     ->free old_data0
>                                                    preempt_enable
>    
>    When CPU1 decrements __num_online_cpus, its value becomes 1.  However,
>    CPU1 has not finished going offline, and will take one last trip through
>    the scheduler and the idle loop before it actually stops executing
>    instructions.  Because synchronize_rcu_tasks_rude() is mostly used for
>    tracing, and because both the scheduler and the idle loop can be traced,
>    this means that CPU0's prematurely ended grace period might disrupt the
>    tracing on CPU1.  Given that this disruption might include CPU1 executing
>    instructions in memory that was just now freed (and maybe reallocated),
>    this is a matter of some concern.
>    
>    This commit therefore removes that problematic single-CPU check from the
>    rcu_tasks_rude_wait_gp() function.  This dispenses with the single-CPU
>    optimization, but there is no evidence indicating that this optimization
>    is important.  In addition, synchronize_rcu_tasks_generic() contains a
>    similar optimization (albeit only for early boot), which also splats.
>    (As in exactly why are you invoking synchronize_rcu_tasks_rude() so
>    early in boot, anyway???)
>    
>    It is OK for the synchronize_rcu_tasks_rude() function's check to be
>    unsynchronized because the only times that this check can evaluate to
>    true is when there is only a single CPU running with preemption
>    disabled.
>    
>    While in the area, this commit also fixes a minor bug in which a
>    call to synchronize_rcu_tasks_rude() would instead be attributed to
>    synchronize_rcu_tasks().
>    
>    [ paulmck: Add "synchronize_" prefix and "()" suffix. ]
>    
>    Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
>    Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>
>diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>index 4dda8e6e5707f..d845723c1af41 100644
>--- a/kernel/rcu/tasks.h
>+++ b/kernel/rcu/tasks.h
>@@ -560,8 +560,9 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
> {
> 	/* Complain if the scheduler has not started.  */
>-	WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
>-			 "synchronize_rcu_tasks called too soon");
>+	if (WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
>+			 "synchronize_%s() called too soon", rtp->name))

Thanks Paul,  detailed description and modification 😊.

>+		return;
> 
> 	// If the grace-period kthread is running, use it.
> 	if (READ_ONCE(rtp->kthread_ptr)) {
>@@ -1064,9 +1065,6 @@ static void rcu_tasks_be_rude(struct work_struct *work)
> // Wait for one rude RCU-tasks grace period.
> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
> {
>-	if (num_online_cpus() <= 1)
>-		return;	// Fastpath for only one CPU.
>-
> 	rtp->n_ipis += cpumask_weight(cpu_online_mask);
> 	schedule_on_each_cpu(rcu_tasks_be_rude);
> }