Patch "rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation" has been added to the 6.1-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sun, 11 Aug 2024 09:04:32 -0400

This is a note to let you know that I've just added the patch titled

    rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation

to the 6.1-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     rcu-fix-rcu_barrier-vs-post-cpuhp_teardown_cpu-invoc.patch
and it can be found in the queue-6.1 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit fadaffe6f78415b7ed03e4b808748379d531150e
Author: Frederic Weisbecker <frederic@xxxxxxxxxx>
Date:   Fri May 24 16:05:24 2024 +0200

    rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation
    
    [ Upstream commit 55d4669ef1b76823083caecfab12a8bd2ccdcf64 ]
    
    When rcu_barrier() calls rcu_rdp_cpu_online() and observes a CPU off
    rnp->qsmaskinitnext, it means that all accesses from the offline CPU
    preceding the CPUHP_TEARDOWN_CPU are visible to RCU barrier, including
    callbacks expiration and counter updates.
    
    However interrupts can still fire after stop_machine() re-enables
    interrupts and before rcutree_report_cpu_dead(). The related accesses
    happening between CPUHP_TEARDOWN_CPU and rnp->qsmaskinitnext clearing
    are _NOT_ guaranteed to be seen by rcu_barrier() without proper
    ordering, especially when callbacks are invoked there to the end, making
    rcutree_migrate_callback() bypass barrier_lock.
    
    The following theoretical race example can make rcu_barrier() hang:
    
    CPU 0                                               CPU 1
    -----                                               -----
    //cpu_down()
    smpboot_park_threads()
    //ksoftirqd is parked now
    <IRQ>
    rcu_sched_clock_irq()
       invoke_rcu_core()
    do_softirq()
       rcu_core()
          rcu_do_batch()
             // callback storm
             // rcu_do_batch() returns
             // before completing all
             // of them
       // do_softirq also returns early because of
       // timeout. It defers to ksoftirqd but
       // it's parked
    </IRQ>
    stop_machine()
       take_cpu_down()
                                                        rcu_barrier()
                                                            spin_lock(barrier_lock)
                                                            // observes rcu_segcblist_n_cbs(&rdp->cblist) != 0
    <IRQ>
    do_softirq()
       rcu_core()
          rcu_do_batch()
             //completes all pending callbacks
             //smp_mb() implied _after_ callback number dec
    </IRQ>
    
    rcutree_report_cpu_dead()
       rnp->qsmaskinitnext &= ~rdp->grpmask;
    
    rcutree_migrate_callback()
       // no callback, early return without locking
       // barrier_lock
                                                            //observes !rcu_rdp_cpu_online(rdp)
                                                            rcu_barrier_entrain()
                                                               rcu_segcblist_entrain()
                                                                  // Observe rcu_segcblist_n_cbs(rsclp) == 0
                                                                  // because no barrier between reading
                                                                  // rnp->qsmaskinitnext and rsclp->len
                                                                  rcu_segcblist_add_len()
                                                                     smp_mb__before_atomic()
                                                                     // will now observe the 0 count and empty
                                                                     // list, but too late, we enqueue regardless
                                                                     WRITE_ONCE(rsclp->len, rsclp->len + v);
                                                            // ignored barrier callback
                                                            // rcu barrier stall...
    
    This could be solved with a read memory barrier, enforcing the message
    passing between rnp->qsmaskinitnext and rsclp->len, matching the full
    memory barrier after rsclp->len addition in rcu_segcblist_add_len()
    performed at the end of rcu_do_batch().
    
    However the rcu_barrier() is complicated enough and probably doesn't
    need too many more subtleties. CPU down is a slowpath and the
    barrier_lock seldom contended. Solve the issue with unconditionally
    locking the barrier_lock on rcutree_migrate_callbacks(). This makes sure
    that either rcu_barrier() sees the empty queue or its entrained
    callback will be migrated.
    
    Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
    Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 61f9503a5fe9c..cd6144cea5a1a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4391,11 +4391,15 @@ void rcutree_migrate_callbacks(int cpu)
 	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	bool needwake;
 
-	if (rcu_rdp_is_offloaded(rdp) ||
-	    rcu_segcblist_empty(&rdp->cblist))
-		return;  /* No callbacks to migrate. */
+	if (rcu_rdp_is_offloaded(rdp))
+		return;
 
 	raw_spin_lock_irqsave(&rcu_state.barrier_lock, flags);
+	if (rcu_segcblist_empty(&rdp->cblist)) {
+		raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, flags);
+		return;  /* No callbacks to migrate. */
+	}
+
 	WARN_ON_ONCE(rcu_rdp_cpu_online(rdp));
 	rcu_barrier_entrain(rdp);
 	my_rdp = this_cpu_ptr(&rcu_data);