On 13 Oct 2022 12:02:09 +0200 Peter Zijlstra <peterz@xxxxxxxxxxxxx> > On Wed, Oct 12, 2022 at 09:33:32AM -0400, Waiman Long wrote: > > A non-first waiter can potentially spin in the for loop of > > rwsem_down_write_slowpath() without sleeping but fail to acquire the > > lock even if the rwsem is free if the following sequence happens: > > > > Non-first waiter First waiter Lock holder > > ---------------- ------------ ----------- > > Acquire wait_lock > > rwsem_try_write_lock(): > > Set handoff bit if RT or > > wait too long > > Set waiter->handoff_set > > Release wait_lock > > Acquire wait_lock > > Inherit waiter->handoff_set > > Release wait_lock > > Clear owner > > Release lock > > if (waiter.handoff_set) { > > rwsem_spin_on_owner((); > > if (OWNER_NULL) > > goto trylock_again; > > } > > trylock_again: > > Acquire wait_lock > > rwsem_try_write_lock(): > > if (first->handoff_set && (waiter != first)) > > return false; > > Release wait_lock > > > > It is especially problematic if the non-first waiter is an RT task and > > it is running on the same CPU as the first waiter as this can lead to > > live lock. > > > So why not do a better handoff? Specifically, have the owner set owner > to first-waiter instead of NULL ? (like the normal mutex code) Given a simple coding of "better handoff", with care to avoid change added to fast path, I see no bonus except for preventing non-first waiters from spinning. Hillf +++ b/kernel/locking/rwsem.c @@ -429,6 +429,7 @@ static void rwsem_mark_wake(struct rw_se if (waiter->type == RWSEM_WAITING_FOR_WRITE) { if (wake_type == RWSEM_WAKE_ANY) { + atomic_long_set(&sem->owner, (long)waiter->task); /* * Mark writer at the front of the queue for wakeup. * Until the task is actually later awoken later by @@ -752,7 +753,7 @@ rwsem_owner_state(struct task_struct *ow static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem) { - struct task_struct *new, *owner; + struct task_struct *new, *owner, *me = current; unsigned long flags, new_flags; enum owner_state state; @@ -762,6 +763,8 @@ rwsem_spin_on_owner(struct rw_semaphore state = rwsem_owner_state(owner, flags); if (state != OWNER_WRITER) return state; + if (owner == me) + return OWNER_NULL; for (;;) { /* @@ -772,7 +775,10 @@ rwsem_spin_on_owner(struct rw_semaphore */ new = rwsem_owner_flags(sem, &new_flags); if ((new != owner) || (new_flags != flags)) { - state = rwsem_owner_state(new, new_flags); + if (new == me) + state = OWNER_NULL; + else + state = rwsem_owner_state(new, new_flags); break; }