On 2023-02-20 10:49:26 [+0100], Thomas Gleixner wrote: > > The logic is different but the deadlock should be avoided: > > - mutex_t and rw_semaphore invoke schedule() while blocking on a lock. > > As part of schedule() sched_submit_work() is invoked. > > This is the same in RT and !RT so I don't expect any dead lock since > > the involved locks are the same. > > Huch? > > xlog_cil_commit() > down_read(&cil->xc_ctx_lock) > __rwbase_read_lock() > __rt_mutex_slowlock() > current->pi_blocked_on = ... > schedule() > __blk_flush_plug() > dd_insert_requests() > rt_spin_lock() > WARN_ON(current->pi_blocked_on); > > So something like the below is required. But that might not cut it > completely. wq_worker_sleeping() is fine, but I'm not convinced that > io_wq_worker_sleeping() is safe. That needs some investigation. Okay, so this makes sense. > Thanks, > > tglx > --- > > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6666,6 +6666,9 @@ static inline void sched_submit_work(str > */ > SCHED_WARN_ON(current->__state & TASK_RTLOCK_WAIT); > > + if (current->pi_blocked_on) > + return; > + The ->pi_blocked_on field is set by __rwbase_read_lock() before schedule() is invoked while blocking on the sleeping lock. By doing this we avoid __blk_flush_plug() and as such will may deadlock because we are going to sleep and made I/O progress earlier which is not globally visibly but might be (s/might be/is/ in the deadlock case) expected by the owner of the lock. We could trylock and if this fails, flush and do the proper lock. This would ensure that we set pi_blocked_on after we flushed. > /* > * If we are going to sleep and we have plugged IO queued, > * make sure to submit it to avoid deadlocks. Sebastian