On 2023-02-20 12:04:56 [+0100], To Thomas Gleixner wrote: > The ->pi_blocked_on field is set by __rwbase_read_lock() before > schedule() is invoked while blocking on the sleeping lock. By doing this > we avoid __blk_flush_plug() and as such will may deadlock because we are > going to sleep and made I/O progress earlier which is not globally > visibly but might be (s/might be/is/ in the deadlock case) expected by > the owner of the lock. > > We could trylock and if this fails, flush and do the proper lock. > This would ensure that we set pi_blocked_on after we flushed. Something like the diff below takes down_read(), down_write() into account. read_lock()/ write_lock() is excluded via the state check. mutex_t is missing. It needs to be flushed before the pi_blocked_on is assigned, before the wait lock is acquired: diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 728f434de2bbf..95731d0c9e87f 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1700,6 +1700,13 @@ static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock, if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) return 0; + if (state != TASK_RTLOCK_WAIT) { + /* + * If we are going to sleep and we have plugged IO queued, + * make sure to submit it to avoid deadlocks. + */ + blk_flush_plug(tsk->plug, true); + } return rt_mutex_slowlock(lock, NULL, state); } #endif /* RT_MUTEX_BUILD_MUTEX */ diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index c201aadb93017..6c6c88a2d9228 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -143,6 +143,14 @@ static __always_inline int rwbase_read_lock(struct rwbase_rt *rwb, if (rwbase_read_trylock(rwb)) return 0; + if (state != TASK_RTLOCK_WAIT) { + /* + * If we are going to sleep and we have plugged IO queued, + * make sure to submit it to avoid deadlocks. + */ + blk_flush_plug(tsk->plug, true); + } + return __rwbase_read_lock(rwb, state); } Sebastian