On 2022/09/30 9:19, Shinichiro Kawasaki wrote: > Kernel v6.0-rc7 triggered the WARN "possible circular locking dependency > detected" at blktests test case block/011 for NVME devices [1]. The trigger > commit was c0feea594e05 ("workqueue: don't skip lockdep work dependency in > cancel_work_sync()"). The commit was backported to v5.15.71 stable kernel and > the WARN is observed with the stable kernel also. It looks that the possible > deadlock scenario has been hidden for long time and the commit revealed it now. Right. > I tried to understand the deadlock scenario, but can not tell if it is for real, > or false-positive. Help to understand the scenario will be appreciated. The > lockdep splat mentions three locks (ctrl->namespaces_rwsem, dev->shutdown_lock > and q->timeout_work). Right. This is a circular locking problem which involves three locks. (A) (work_completion)(&q->timeout_work) --> &dev->shutdown_lock (B) &dev->shutdown_lock --> &ctrl->namespaces_rwsem (C) &ctrl->namespaces_rwsem --> (work_completion)(&q->timeout_work) (A) and (B) happens due to INIT_WORK(&q->timeout_work, blk_mq_timeout_work); blk_mq_timeout_work(work) { // Is blocking __flush_work() from cancel_work_sync(). blk_mq_queue_tag_busy_iter(blk_mq_check_expired) { bt_for_each(blk_mq_check_expired) == blk_mq_check_expired() { blk_mq_rq_timed_out() { req->q->mq_ops->timeout(req) == nvme_timeout(req) { nvme_dev_disable() { mutex_lock(&dev->shutdown_lock); // Holds dev->shutdown_lock nvme_start_freeze(&dev->ctrl) { down_read(&ctrl->namespaces_rwsem); // Holds ctrl->namespaces_rwsem which might block up_read(&ctrl->namespaces_rwsem); } mutex_unlock(&dev->shutdown_lock); } } } } } } (C) happens due to nvme_sync_queues(ctrl) { nvme_sync_io_queues(ctrl) { down_read(&ctrl->namespaces_rwsem); // Holds ctrl->namespaces_rwsem which might block /*** Question comes here. Please check the last block in this mail. ***/ blk_sync_queue(ns->queue) { cancel_work_sync(&q->timeout_work) { __flush_work((&q->timeout_work, true) { // Might wait for completion of blk_mq_timeout_work() with ctrl->namespaces_rwsem held. } } } up_read(&ctrl->namespaces_rwsem); } } > In the related function call paths, it looks that > ctrl->namespaces_rwsem is locked only for read, so the deadlock scenario does > not look possible for me. Right. > (Maybe I'm misunderstanding something...) Although ctrl->namespaces_rwsem is a rw-sem and is held for read in these paths, there are paths which hold ctrl->namespaces_rwsem for write. If someone is trying to hold that rw-sem for write, these down_read(&ctrl->namespaces_rwsem) will be blocked. That is, it also depends on whether down_read(&ctrl->namespaces_rwsem); and down_write(&ctrl->namespaces_rwsem); might run in parallel. Question: Does someone else call down_write(&ctrl->namespaces_rwsem) immediately after someone's down_read(&ctrl->namespaces_rwsem) succeeded? nvme_alloc_ns()/nvme_ns_remove()/nvme_remove_invalid_namespaces()/nvme_remove_namespaces() calls down_write(&ctrl->namespaces_rwsem), and some of these are called from work callback functions which might run in parallel with other work callback functions?