On Fri, Dec 16, 2022 at 11:14:12AM +0000, Will Deacon wrote: > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > > index 35212f260148..af0dbe4d5e97 100644 > > --- a/kernel/locking/rtmutex.c > > +++ b/kernel/locking/rtmutex.c > > @@ -238,6 +238,13 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) > > owner = *p; > > } while (cmpxchg_relaxed(p, owner, > > owner | RT_MUTEX_HAS_WAITERS) != owner); > > + > > + /* > > + * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE > > + * operations in the event of contention. Ensure the successful > > + * cmpxchg is visible. > > + */ > > + smp_mb__after_atomic(); > > Could we use smp_acquire__after_ctrl_dep() instead? > It's might be sufficient but it's not as obviously correct. It lacks an obvious pairing that is definitely correct but the control dependency combined with the smp_acquire__after_ctrl_dep *should* be sufficient against the lock fast path based on the available documentation. However, I was unable to convince myself that this is definitely correct on all CPUs. Given that arm64 was trivial to crash on PREEMPT_RT before the patch (hackbench pipes also triggers the same problem), I'm reluctant to try and be too clever particularly as I didn't have a reproducer for a problem in this specific path. If someone can demonstrate a reasonable case where smp_mb__after_atomic() is too heavy and document that it can be safely relaxed then great. At least if that relaxing is wrong, there will be a bisection point between the fix and the reintroduction of damage. -- Mel Gorman SUSE Labs