On Mon, 2023-02-20 at 19:21 +0100, Thomas Gleixner wrote: > On Mon, Feb 20 2023 at 12:42, Sebastian Andrzej Siewior wrote: > > On 2023-02-20 12:04:56 [+0100], To Thomas Gleixner wrote: > > > The ->pi_blocked_on field is set by __rwbase_read_lock() before > > > schedule() is invoked while blocking on the sleeping lock. By doing > > > this > > > we avoid __blk_flush_plug() and as such will may deadlock because we > > > are > > > going to sleep and made I/O progress earlier which is not globally > > > visibly but might be (s/might be/is/ in the deadlock case) expected by > > > the owner of the lock. > > Fair enough. > > > --- a/kernel/locking/rtmutex.c > > +++ b/kernel/locking/rtmutex.c > > @@ -1700,6 +1700,13 @@ static __always_inline int __rt_mutex_lock(struct > > rt_mutex_base *lock, > > if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) > > return 0; > > > > + if (state != TASK_RTLOCK_WAIT) { > > + /* > > + * If we are going to sleep and we have plugged IO > > queued, > > + * make sure to submit it to avoid deadlocks. > > + */ > > + blk_flush_plug(tsk->plug, true); > > This still leaves the problem vs. io_wq_worker_sleeping() and it's > running() counterpart after schedule(). The closest thing I can see to a problem there is io_wqe_dec_running()-> io_queue_worker_create()->io_wq_cancel_tw_create()->kfree(), but that only happens with func == create_worker_cont(), and io_wqe_dec_running() uses create_worker_cb(). Are there any workloads I could run to stress out that path (with my asserts in place)? -Scott