Re: rtmutex, pi_blocked_on, and blk_flush_plug()

Crystal Wood <swood@xxxxxxxxxx> · Sat, 04 Mar 2023 23:39:57 -0600

On Mon, 2023-02-20 at 19:21 +0100, Thomas Gleixner wrote:
> On Mon, Feb 20 2023 at 12:42, Sebastian Andrzej Siewior wrote:
> > On 2023-02-20 12:04:56 [+0100], To Thomas Gleixner wrote:
> > > The ->pi_blocked_on field is set by __rwbase_read_lock() before
> > > schedule() is invoked while blocking on the sleeping lock. By doing
> > > this
> > > we avoid __blk_flush_plug() and as such will may deadlock because we
> > > are
> > > going to sleep and made I/O progress earlier which is not globally
> > > visibly but might be (s/might be/is/ in the deadlock case) expected by
> > > the owner of the lock.
> 
> Fair enough.
> 
> > --- a/kernel/locking/rtmutex.c
> > +++ b/kernel/locking/rtmutex.c
> > @@ -1700,6 +1700,13 @@ static __always_inline int __rt_mutex_lock(struct
> > rt_mutex_base *lock,
> >         if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
> >                 return 0;
> >  
> > +       if (state != TASK_RTLOCK_WAIT) {
> > +               /*
> > +                * If we are going to sleep and we have plugged IO
> > queued,
> > +                * make sure to submit it to avoid deadlocks.
> > +                */
> > +               blk_flush_plug(tsk->plug, true);
> 
> This still leaves the problem vs. io_wq_worker_sleeping() and it's
> running() counterpart after schedule().

The closest thing I can see to a problem there is io_wqe_dec_running()->
io_queue_worker_create()->io_wq_cancel_tw_create()->kfree(), but that only
happens with func == create_worker_cont(), and io_wqe_dec_running() uses 
create_worker_cb(). 

Are there any workloads I could run to stress out that path (with my
asserts in place)?

-Scott