On Fri, Dec 02, 2022 at 03:01:58PM +0000, Mel Gorman wrote: > On Fri, Dec 02, 2022 at 12:21:06PM +0100, Sebastian Andrzej Siewior wrote: > > On 2022-12-02 10:02:23 [+0000], Mel Gorman wrote: > > > The lock owner is updated with an IRQ-safe raw spinlock held but the > > > spin_unlock does not provide acquire semantics which are needed when > > > acquiring a mutex. This patch adds the necessary acquire semantics for a > > > lock operation when the lock owner is updated. It successfully completed > > > 10 iterations of the dbench workload while the vanilla kernel fails on > > > the first iteration. > > > > I *think* it is > > > > Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics") > > > > Adding Davidlohr to cc. > > It might have made the problem worse but even then rt_mutex_set_owner was > just a plain assignment and while I didn't check carefully, at a glance > try_to_take_rt_mutex didn't look like it guaranteed ACQUIRE semantics. > > > Before that, it did cmpxchg() which should be fine. > > > > Regarding mark_rt_mutex_waiters(). Isn't acquire semantic required in > > order for the lock-owner not perform the fastpath but go to the slowpath > > instead? > > > > Good spot, it does. While the most straight-forward solution is to use > cmpxchg_acquire, I think it is overkill because it could incur back-to-back > ACQUIRE operations in the event of contention. There could be a smp_wmb > after the cmpxchg_relaxed but that impacts all arches and a non-paired > smp_wmb is generally frowned upon. > > I'm thinking this on top of the patch should be sufficient even though > it's a heavier operation than is necesary for ACQUIRE as well as being > "not typical" according to Documentation/atomic_t.txt. Will, as this > affects ARM primarily do you have any preference? > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > index 35212f260148..af0dbe4d5e97 100644 > --- a/kernel/locking/rtmutex.c > +++ b/kernel/locking/rtmutex.c > @@ -238,6 +238,13 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock) > owner = *p; > } while (cmpxchg_relaxed(p, owner, > owner | RT_MUTEX_HAS_WAITERS) != owner); > + > + /* > + * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE > + * operations in the event of contention. Ensure the successful > + * cmpxchg is visible. > + */ > + smp_mb__after_atomic(); Could we use smp_acquire__after_ctrl_dep() instead? Will