Patch "rcu/nocb: Remove buggy bypass lock contention mitigation" has been added to the 6.10-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Wed, 4 Sep 2024 13:47:06 -0400

This is a note to let you know that I've just added the patch titled

    rcu/nocb: Remove buggy bypass lock contention mitigation

to the 6.10-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     rcu-nocb-remove-buggy-bypass-lock-contention-mitigat.patch
and it can be found in the queue-6.10 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 4c229c2be47fd4248d8da6d7466982318acd93c2
Author: Frederic Weisbecker <frederic@xxxxxxxxxx>
Date:   Thu Apr 25 16:18:35 2024 +0200

    rcu/nocb: Remove buggy bypass lock contention mitigation
    
    [ Upstream commit e4f78057291608f6968a6789c5ebb3bde7d95504 ]
    
    The bypass lock contention mitigation assumes there can be at most
    2 contenders on the bypass lock, following this scheme:
    
    1) One kthread takes the bypass lock
    2) Another one spins on it and increment the contended counter
    3) A third one (a bypass enqueuer) sees the contended counter on and
      busy loops waiting on it to decrement.
    
    However this assumption is wrong. There can be only one CPU to find the
    lock contended because call_rcu() (the bypass enqueuer) is the only
    bypass lock acquire site that may not already hold the NOCB lock
    beforehand, all the other sites must first contend on the NOCB lock.
    Therefore step 2) is impossible.
    
    The other problem is that the mitigation assumes that contenders all
    belong to the same rdp CPU, which is also impossible for a raw spinlock.
    In theory the warning could trigger if the enqueuer holds the bypass
    lock and another CPU flushes the bypass queue concurrently but this is
    prevented from all flush users:
    
    1) NOCB kthreads only flush if they successfully _tried_ to lock the
       bypass lock. So no contention management here.
    
    2) Flush on callbacks migration happen remotely when the CPU is offline.
       No concurrency against bypass enqueue.
    
    3) Flush on deoffloading happen either locally with IRQs disabled or
       remotely when the CPU is not yet online. No concurrency against
       bypass enqueue.
    
    4) Flush on barrier entrain happen either locally with IRQs disabled or
       remotely when the CPU is offline. No concurrency against
       bypass enqueue.
    
    For those reasons, the bypass lock contention mitigation isn't needed
    and is even wrong. Remove it but keep the warning reporting a contended
    bypass lock on a remote CPU, to keep unexpected contention awareness.
    
    Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
    Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index bae7925c497f..179f60ca0313 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -223,7 +223,6 @@ struct rcu_data {
 	struct swait_queue_head nocb_state_wq; /* For offloading state changes */
 	struct task_struct *nocb_gp_kthread;
 	raw_spinlock_t nocb_lock;	/* Guard following pair of fields. */
-	atomic_t nocb_lock_contended;	/* Contention experienced. */
 	int nocb_defer_wakeup;		/* Defer wakeup of nocb_kthread. */
 	struct timer_list nocb_timer;	/* Enforce finite deferral. */
 	unsigned long nocb_gp_adv_time;	/* Last call_rcu() CB adv (jiffies). */
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 3f85577bddd4..2d9eed2bf750 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -91,8 +91,7 @@ module_param(nocb_nobypass_lim_per_jiffy, int, 0);
 
 /*
  * Acquire the specified rcu_data structure's ->nocb_bypass_lock.  If the
- * lock isn't immediately available, increment ->nocb_lock_contended to
- * flag the contention.
+ * lock isn't immediately available, perform minimal sanity check.
  */
 static void rcu_nocb_bypass_lock(struct rcu_data *rdp)
 	__acquires(&rdp->nocb_bypass_lock)
@@ -100,29 +99,12 @@ static void rcu_nocb_bypass_lock(struct rcu_data *rdp)
 	lockdep_assert_irqs_disabled();
 	if (raw_spin_trylock(&rdp->nocb_bypass_lock))
 		return;
-	atomic_inc(&rdp->nocb_lock_contended);
+	/*
+	 * Contention expected only when local enqueue collide with
+	 * remote flush from kthreads.
+	 */
 	WARN_ON_ONCE(smp_processor_id() != rdp->cpu);
-	smp_mb__after_atomic(); /* atomic_inc() before lock. */
 	raw_spin_lock(&rdp->nocb_bypass_lock);
-	smp_mb__before_atomic(); /* atomic_dec() after lock. */
-	atomic_dec(&rdp->nocb_lock_contended);
-}
-
-/*
- * Spinwait until the specified rcu_data structure's ->nocb_lock is
- * not contended.  Please note that this is extremely special-purpose,
- * relying on the fact that at most two kthreads and one CPU contend for
- * this lock, and also that the two kthreads are guaranteed to have frequent
- * grace-period-duration time intervals between successive acquisitions
- * of the lock.  This allows us to use an extremely simple throttling
- * mechanism, and further to apply it only to the CPU doing floods of
- * call_rcu() invocations.  Don't try this at home!
- */
-static void rcu_nocb_wait_contended(struct rcu_data *rdp)
-{
-	WARN_ON_ONCE(smp_processor_id() != rdp->cpu);
-	while (WARN_ON_ONCE(atomic_read(&rdp->nocb_lock_contended)))
-		cpu_relax();
 }
 
 /*
@@ -510,7 +492,6 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 	}
 
 	// We need to use the bypass.
-	rcu_nocb_wait_contended(rdp);
 	rcu_nocb_bypass_lock(rdp);
 	ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
 	rcu_segcblist_inc_len(&rdp->cblist); /* Must precede enqueue. */
@@ -1678,12 +1659,11 @@ static void show_rcu_nocb_state(struct rcu_data *rdp)
 
 	sprintf(bufw, "%ld", rsclp->gp_seq[RCU_WAIT_TAIL]);
 	sprintf(bufr, "%ld", rsclp->gp_seq[RCU_NEXT_READY_TAIL]);
-	pr_info("   CB %d^%d->%d %c%c%c%c%c%c F%ld L%ld C%d %c%c%s%c%s%c%c q%ld %c CPU %d%s\n",
+	pr_info("   CB %d^%d->%d %c%c%c%c%c F%ld L%ld C%d %c%c%s%c%s%c%c q%ld %c CPU %d%s\n",
 		rdp->cpu, rdp->nocb_gp_rdp->cpu,
 		nocb_next_rdp ? nocb_next_rdp->cpu : -1,
 		"kK"[!!rdp->nocb_cb_kthread],
 		"bB"[raw_spin_is_locked(&rdp->nocb_bypass_lock)],
-		"cC"[!!atomic_read(&rdp->nocb_lock_contended)],
 		"lL"[raw_spin_is_locked(&rdp->nocb_lock)],
 		"sS"[!!rdp->nocb_cb_sleep],
 		".W"[swait_active(&rdp->nocb_cb_wq)],