Re: rtmutex, pi_blocked_on, and blk_flush_plug()

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Thu, 16 Feb 2023 15:17:57 +0100

On 2023-02-09 22:31:57 [-0600], Crystal Wood wrote:
> Hello!
Hi,

> It is possible for blk_flush_plug() to be called while
> current->pi_blocked_on is set, in the process of trying to acquire an rwsem.
> If the block flush blocks trying to acquire some lock, then it appears that
> current->pi_blocked_on will be overwritten, and then set to NULL once that
> lock is acquired, even though the task is still blocked on the original
> rwsem.  Am I missing something that deals with this situation?  It seems
> like the lock types that are supposed to call blk_flush_plug() should do so
> before calling task_blocks_on_rt_mutex().

Do you experience a problem in v6.1-RT?

> I originally noticed this while investigating a related issue on an older
> RHEL kernel where task_blocked_on_mutex() has a BUG_ON if entered with
> current->pi_blocked_on non-NULL.  Current kernels lack this check.

The logic is different but the deadlock should be avoided:
- mutex_t and rw_semaphore invoke schedule() while blocking on a lock.
  As part of schedule() sched_submit_work() is invoked.
  This is the same in RT and !RT so I don't expect any dead lock since
  the involved locks are the same.

- spinlock_t invokes schedule_rtlock() which avoids sched_submit_work().
  This is the behaviour as with !RT because it spins and does not submit
  work either.
  rwlock_t should be have the same way but invokes schedule() instead.
  This looks wrong. And it could deadlock in sched_submit_work().

> To demonstrate that the recursive blocking scenario can happen (without
> actually waiting to hit the scenario where the second lock is contended),
> I put a WARN_ON_ONCE(current->pi_blocked_on) in rtlock_lock() (plus a few
> other places, but this is the one I hit):

XFS does not use rwlock_t directly.

> -Crystal

Sebastian