Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes

Nicolai Hähnle <nhaehnle@xxxxxxxxx> · Thu, 24 Nov 2016 12:26:57 +0100




On 23.11.2016 15:25, Daniel Vetter wrote:
On Wed, Nov 23, 2016 at 03:03:36PM +0100, Peter Zijlstra wrote:
On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote:
@@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 	 */
 	mutex_clear_owner(&lock->base);
 #endif
-	__mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
+	/*
+	 * A previously _not_ waiting task may acquire the lock via the fast
+	 * path during our unlock. In that case, already waiting tasks may have
+	 * to back off to avoid a deadlock. Wake up all waiters so that they
+	 * can check their acquire context stamp against the new owner.
+	 */
+	__mutex_fastpath_unlock(&lock->base.count,
+				__mutex_unlock_slowpath_wakeall);
 }
So doing a wake-all has obvious issues with thundering herd etc.. Also,
with the new mutex, you'd not be able to do hand-off, which would
introduce starvation cases.

Ideally we'd iterate the blocked list and pick the waiter with the
earliest stamp, or we'd maintain the list in stamp order instead of
FIFO, for ww_mutex.
Not sure we'll win that much, at least I think we still need to wake up
everyone with earlier stamp than the one of the task that just released
the lock. Otherwise there's deadlocks. So just cuts the wakeups in half,
on average.

What we could do is do a handoff-hint with the timestamp of earliest task
we believe should get the lock. Everyone with a later timestamp that gets
woken then knows that they definitely have a deadlock situation and need
to back off (thread 2 in the example).

thread 1 would get woken, and would be able to take the lock, except when
thread 0 successfully raced it and stole the lock. And anyone else racing
in with later timestamps would also immediately back off, ensuring
fairness.
I did consider maintaining stamp order in the waiter list and originally 
decided against it because I just wanted a simple and conservative fix 
to avoid adding new regressions.
Now that a different approach is needed for >= 4.9 anyway mostly due to 
the hand-off logic, I'm reconsidering this.
I do believe we can win a bit by keeping the wait list sorted, if we 
also make sure that waiters don't add themselves in the first place if 
they see that a deadlock situation cannot be avoided.
I will probably want to extend struct mutex_waiter with 
ww_mutex-specific fields to facilitate this (i.e. ctx pointer, perhaps 
stamp as well to reduce pointer-chasing). That should be fine since it 
lives on the stack.
In the meantime, I'd appreciate it if patch #1 could be accepted as-is 
for stable updates to <= 4.8. It fixes a real (if rare) bug, and the 
stampede inefficiency isn't a problem in practice at least for GPU 
applications.
Thanks,
Nicolai

Without thinking it through in detail this is a PI issue, except that we
replace boosting with wakeup&back-off. Could we perhaps steal something
from rt mutexes to make it fair&efficient?
-Daniel

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html