Re: lockdep WARNING at blktests block/011

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Fri, 30 Sep 2022 20:06:10 +0900

On 2022/09/30 9:19, Shinichiro Kawasaki wrote:
> Kernel v6.0-rc7 triggered the WARN "possible circular locking dependency
> detected" at blktests test case block/011 for NVME devices [1]. The trigger
> commit was c0feea594e05 ("workqueue: don't skip lockdep work dependency in
> cancel_work_sync()"). The commit was backported to v5.15.71 stable kernel and
> the WARN is observed with the stable kernel also. It looks that the possible
> deadlock scenario has been hidden for long time and the commit revealed it now.

Right.

> I tried to understand the deadlock scenario, but can not tell if it is for real,
> or false-positive. Help to understand the scenario will be appreciated. The
> lockdep splat mentions three locks (ctrl->namespaces_rwsem, dev->shutdown_lock
> and q->timeout_work).

Right.

This is a circular locking problem which involves three locks.

  (A) (work_completion)(&q->timeout_work) --> &dev->shutdown_lock

  (B) &dev->shutdown_lock --> &ctrl->namespaces_rwsem

  (C) &ctrl->namespaces_rwsem --> (work_completion)(&q->timeout_work)

(A) and (B) happens due to

  INIT_WORK(&q->timeout_work, blk_mq_timeout_work);

  blk_mq_timeout_work(work) { // Is blocking __flush_work() from cancel_work_sync().
    blk_mq_queue_tag_busy_iter(blk_mq_check_expired) {
      bt_for_each(blk_mq_check_expired) == blk_mq_check_expired() {
        blk_mq_rq_timed_out() {
          req->q->mq_ops->timeout(req) == nvme_timeout(req) {
            nvme_dev_disable() {
              mutex_lock(&dev->shutdown_lock); // Holds dev->shutdown_lock
              nvme_start_freeze(&dev->ctrl) {
                down_read(&ctrl->namespaces_rwsem); // Holds ctrl->namespaces_rwsem which might block
                up_read(&ctrl->namespaces_rwsem);
              }
              mutex_unlock(&dev->shutdown_lock);
            }
          }
        }
      }
    }
  }

(C) happens due to

  nvme_sync_queues(ctrl) {
    nvme_sync_io_queues(ctrl) {
      down_read(&ctrl->namespaces_rwsem); // Holds ctrl->namespaces_rwsem which might block
      /*** Question comes here. Please check the last block in this mail. ***/
      blk_sync_queue(ns->queue) {
	cancel_work_sync(&q->timeout_work) {
          __flush_work((&q->timeout_work, true) {
            // Might wait for completion of blk_mq_timeout_work() with ctrl->namespaces_rwsem held.
          }
        }
      }
      up_read(&ctrl->namespaces_rwsem);
    }
  }

>                       In the related function call paths, it looks that
> ctrl->namespaces_rwsem is locked only for read, so the deadlock scenario does
> not look possible for me.

Right.

>                           (Maybe I'm misunderstanding something...)

Although ctrl->namespaces_rwsem is a rw-sem and is held for read in these paths,
there are paths which hold ctrl->namespaces_rwsem for write. If someone is trying
to hold that rw-sem for write, these down_read(&ctrl->namespaces_rwsem) will be blocked.

That is, it also depends on whether
  down_read(&ctrl->namespaces_rwsem);
and
  down_write(&ctrl->namespaces_rwsem);
might run in parallel.

Question:
  Does someone else call down_write(&ctrl->namespaces_rwsem) immediately after
  someone's down_read(&ctrl->namespaces_rwsem) succeeded?

nvme_alloc_ns()/nvme_ns_remove()/nvme_remove_invalid_namespaces()/nvme_remove_namespaces() calls
down_write(&ctrl->namespaces_rwsem), and some of these are called from work callback functions
which might run in parallel with other work callback functions?