Patch "powerpc/qspinlock: Fix stale propagated yield_cpu" has been added to the 6.5-stable tree

<gregkh@xxxxxxxxxxxxxxxxxxx> · Sun, 22 Oct 2023 15:13:40 +0200

This is a note to let you know that I've just added the patch titled

    powerpc/qspinlock: Fix stale propagated yield_cpu

to the 6.5-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     powerpc-qspinlock-fix-stale-propagated-yield_cpu.patch
and it can be found in the queue-6.5 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.


>From f9bc9bbe8afdf83412728f0b464979a72a3b9ec2 Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin@xxxxxxxxx>
Date: Mon, 16 Oct 2023 22:43:00 +1000
Subject: powerpc/qspinlock: Fix stale propagated yield_cpu

From: Nicholas Piggin <npiggin@xxxxxxxxx>

commit f9bc9bbe8afdf83412728f0b464979a72a3b9ec2 upstream.

yield_cpu is a sample of a preempted lock holder that gets propagated
back through the queue. Queued waiters use this to yield to the
preempted lock holder without continually sampling the lock word (which
would defeat the purpose of MCS queueing by bouncing the cache line).

The problem is that yield_cpu can become stale. It can take some time to
be passed down the chain, and if any queued waiter gets preempted then
it will cease to propagate the yield_cpu to later waiters.

This can result in yielding to a CPU that no longer holds the lock,
which is bad, but particularly if it is currently in H_CEDE (idle),
then it appears to be preempted and some hypervisors (PowerVM) can
cause very long H_CONFER latencies waiting for H_CEDE wakeup. This
results in latency spikes and hard lockups on oversubscribed
partitions with lock contention.

This is a minimal fix. Before yielding to yield_cpu, sample the lock
word to confirm yield_cpu is still the owner, and bail out of it is not.

Thanks to a bunch of people who reported this and tracked down the
exact problem using tracepoints and dispatch trace logs.

Fixes: 28db61e207ea ("powerpc/qspinlock: allow propagation of yield CPU down the queue")
Cc: stable@xxxxxxxxxxxxxxx # v6.2+
Reported-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx>
Reported-by: Laurent Dufour <ldufour@xxxxxxxxxxxxx>
Reported-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxxxxxxx>
Debugged-by: "Nysal Jan K.A" <nysal@xxxxxxxxxxxxx>
Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx>
Tested-by: Shrikanth Hegde <sshegde@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
Link: https://msgid.link/20231016124305.139923-2-npiggin@xxxxxxxxx
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
---
 arch/powerpc/lib/qspinlock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index 253620979d0c..6dd2f46bd3ef 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -406,6 +406,9 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
 	if ((yield_count & 1) == 0)
 		goto yield_prev; /* owner vcpu is running */
 
+	if (get_owner_cpu(READ_ONCE(lock->val)) != yield_cpu)
+		goto yield_prev; /* re-sample lock owner */
+
 	spin_end();
 
 	preempted = true;
-- 
2.42.0



Patches currently in stable-queue which might be from npiggin@xxxxxxxxx are

queue-6.5/powerpc-qspinlock-fix-stale-propagated-yield_cpu.patch