On Fri, Dec 02, 2022 at 12:21:06PM +0100, Sebastian Andrzej Siewior wrote: > On 2022-12-02 10:02:23 [+0000], Mel Gorman wrote: > > The lock owner is updated with an IRQ-safe raw spinlock held but the > > spin_unlock does not provide acquire semantics which are needed when > > acquiring a mutex. This patch adds the necessary acquire semantics for a > > lock operation when the lock owner is updated. It successfully completed > > 10 iterations of the dbench workload while the vanilla kernel fails on > > the first iteration. > > I *think* it is > > Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics") > Adding Davidlohr to cc. It might have made the problem worse but even then rt_mutex_set_owner was just a plain assignment and while I didn't check carefully, at a glance try_to_take_rt_mutex didn't look like it guaranteed ACQUIRE semantics. > Before that, it did cmpxchg() which should be fine. > > Regarding mark_rt_mutex_waiters(). Isn't acquire semantic required in > order for the lock-owner not perform the fastpath but go to the slowpath > instead? > Good spot, it does. While the most straight-forward solution is to use cmpxchg_acquire, I think it is overkill because it could incur back-to-back ACQUIRE operations in the event of contention. There could be a smp_wmb after the cmpxchg_relaxed but that impacts all arches and a non-paired smp_wmb is generally frowned upon. I'm thinking this on top of the patch should be sufficient even though it's a heavier operation than is necesary for ACQUIRE as well as being "not typical" according to Documentation/atomic_t.txt. Will, as this affects ARM primarily do you have any preference? diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 35212f260148..af0dbe4d5e97 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -238,6 +238,13 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) owner = *p; } while (cmpxchg_relaxed(p, owner, owner | RT_MUTEX_HAS_WAITERS) != owner); + + /* + * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE + * operations in the event of contention. Ensure the successful + * cmpxchg is visible. + */ + smp_mb__after_atomic(); } /*