Patch "rcu/tasks: Fix stale task snaphot for Tasks Trace" has been added to the 6.1-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sat, 27 Jul 2024 10:51:53 -0400

This is a note to let you know that I've just added the patch titled

    rcu/tasks: Fix stale task snaphot for Tasks Trace

to the 6.1-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     rcu-tasks-fix-stale-task-snaphot-for-tasks-trace.patch
and it can be found in the queue-6.1 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 38634b17c01da5020a37a49b59f21bce3ec8c347
Author: Frederic Weisbecker <frederic@xxxxxxxxxx>
Date:   Fri May 17 17:23:02 2024 +0200

    rcu/tasks: Fix stale task snaphot for Tasks Trace
    
    [ Upstream commit 399ced9594dfab51b782798efe60a2376cd5b724 ]
    
    When RCU-TASKS-TRACE pre-gp takes a snapshot of the current task running
    on all online CPUs, no explicit ordering synchronizes properly with a
    context switch.  This lack of ordering can permit the new task to miss
    pre-grace-period update-side accesses.  The following diagram, courtesy
    of Paul, shows the possible bad scenario:
    
            CPU 0                                           CPU 1
            -----                                           -----
    
            // Pre-GP update side access
            WRITE_ONCE(*X, 1);
            smp_mb();
            r0 = rq->curr;
                                                            RCU_INIT_POINTER(rq->curr, TASK_B)
                                                            spin_unlock(rq)
                                                            rcu_read_lock_trace()
                                                            r1 = X;
            /* ignore TASK_B */
    
    Either r0==TASK_B or r1==1 is needed but neither is guaranteed.
    
    One possible solution to solve this is to wait for an RCU grace period
    at the beginning of the RCU-tasks-trace grace period before taking the
    current tasks snaphot. However this would introduce large additional
    latencies to RCU-tasks-trace grace periods.
    
    Another solution is to lock the target runqueue while taking the current
    task snapshot. This ensures that the update side sees the latest context
    switch and subsequent context switches will see the pre-grace-period
    update side accesses.
    
    This commit therefore adds runqueue locking to cpu_curr_snapshot().
    
    Fixes: e386b6725798 ("rcu-tasks: Eliminate RCU Tasks Trace IPIs to online CPUs")
    Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
    Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 6f48f565e3acb..456c956f481ef 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1531,6 +1531,16 @@ static void rcu_tasks_trace_pregp_step(struct list_head *hop)
 	// allow safe access to the hop list.
 	for_each_online_cpu(cpu) {
 		rcu_read_lock();
+		// Note that cpu_curr_snapshot() picks up the target
+		// CPU's current task while its runqueue is locked with
+		// an smp_mb__after_spinlock().  This ensures that either
+		// the grace-period kthread will see that task's read-side
+		// critical section or the task will see the updater's pre-GP
+		// accesses.  The trailing smp_mb() in cpu_curr_snapshot()
+		// does not currently play a role other than simplify
+		// that function's ordering semantics.  If these simplified
+		// ordering semantics continue to be redundant, that smp_mb()
+		// might be removed.
 		t = cpu_curr_snapshot(cpu);
 		if (rcu_tasks_trace_pertask_prep(t, true))
 			trc_add_holdout(t, hop);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cac41c49bd2f5..753d7208123bb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4318,12 +4318,7 @@ int task_call_func(struct task_struct *p, task_call_f func, void *arg)
  * @cpu: The CPU on which to snapshot the task.
  *
  * Returns the task_struct pointer of the task "currently" running on
- * the specified CPU.  If the same task is running on that CPU throughout,
- * the return value will be a pointer to that task's task_struct structure.
- * If the CPU did any context switches even vaguely concurrently with the
- * execution of this function, the return value will be a pointer to the
- * task_struct structure of a randomly chosen task that was running on
- * that CPU somewhere around the time that this function was executing.
+ * the specified CPU.
  *
  * If the specified CPU was offline, the return value is whatever it
  * is, perhaps a pointer to the task_struct structure of that CPU's idle
@@ -4337,11 +4332,16 @@ int task_call_func(struct task_struct *p, task_call_f func, void *arg)
  */
 struct task_struct *cpu_curr_snapshot(int cpu)
 {
+	struct rq *rq = cpu_rq(cpu);
 	struct task_struct *t;
+	struct rq_flags rf;
 
-	smp_mb(); /* Pairing determined by caller's synchronization design. */
+	rq_lock_irqsave(rq, &rf);
+	smp_mb__after_spinlock(); /* Pairing determined by caller's synchronization design. */
 	t = rcu_dereference(cpu_curr(cpu));
+	rq_unlock_irqrestore(rq, &rf);
 	smp_mb(); /* Pairing determined by caller's synchronization design. */
+
 	return t;
 }