On 2022/10/05 17:31, Shinichiro Kawasaki wrote: > @@ -5120,11 +5120,27 @@ EXPORT_SYMBOL_GPL(nvme_start_admin_queue); > void nvme_sync_io_queues(struct nvme_ctrl *ctrl) > { > struct nvme_ns *ns; > + LIST_HEAD(splice); > > - down_read(&ctrl->namespaces_rwsem); > - list_for_each_entry(ns, &ctrl->namespaces, list) > + /* > + * blk_sync_queues() call in ctrl->snamespaces_rwsem critical section > + * triggers deadlock warning by lockdep since cancel_work_sync() in > + * blk_sync_queue() waits for nvme_timeout() work completion which may > + * lock the ctrl->snamespaces_rwsem. To avoid the deadlock possibility, > + * call blk_sync_queues() out of the critical section by moving the > + * ctrl->namespaces list elements to the stack list head temporally. > + */ > + > + down_write(&ctrl->namespaces_rwsem); > + list_splice_init(&ctrl->namespaces, &splice); > + up_write(&ctrl->namespaces_rwsem); Does this work? ctrl->namespaces being empty when calling blk_sync_queue() means that e.g. nvme_start_freeze() cannot find namespaces to freeze, doesn't it? blk_mq_timeout_work(work) { // Is blocking __flush_work() from cancel_work_sync(). blk_mq_queue_tag_busy_iter(blk_mq_check_expired) { bt_for_each(blk_mq_check_expired) == blk_mq_check_expired() { blk_mq_rq_timed_out() { req->q->mq_ops->timeout(req) == nvme_timeout(req) { nvme_dev_disable() { mutex_lock(&dev->shutdown_lock); // Holds dev->shutdown_lock nvme_start_freeze(&dev->ctrl) { down_read(&ctrl->namespaces_rwsem); // Holds ctrl->namespaces_rwsem which might block //blk_freeze_queue_start(ns->queue); // <= Never be called because ctrl->namespaces is empty. up_read(&ctrl->namespaces_rwsem); } mutex_unlock(&dev->shutdown_lock); } } } } } } Are you sure that down_read(&ctrl->namespaces_rwsem) users won't run when ctrl->namespaces is temporarily made empty? (And if you are sure that down_read(&ctrl->namespaces_rwsem) users won't run when ctrl->namespaces is temporarily made empty, why ctrl->namespaces_rwsem needs to be a rw-sem rather than a plain mutex or spinlock ?) > + > + list_for_each_entry(ns, &splice, list) > blk_sync_queue(ns->queue); > - up_read(&ctrl->namespaces_rwsem); > + > + down_write(&ctrl->namespaces_rwsem); > + list_splice(&splice, &ctrl->namespaces); > + up_write(&ctrl->namespaces_rwsem); > } > EXPORT_SYMBOL_GPL(nvme_sync_io_queues); I don't know about dependency chain, but you might be able to add "struct nvme_ctrl"->sync_io_queue_mutex which is held for serializing nvme_sync_io_queues() and down_write(&ctrl->namespaces_rwsem) users? If we can guarantee that ctrl->namespaces_rwsem => ctrl->sync_io_queue_mutex is impossible, nvme_sync_io_queues() can use ctrl->sync_io_queue_mutex rather than ctrl->namespaces_rwsem, and down_write(&ctrl->namespaces_rwsem)/ up_write(&ctrl->namespaces_rwsem) users are replaced with mutex_lock(&ctrl->sync_io_queue_mutex); down_write(&ctrl->namespaces_rwsem); and up_write(&ctrl->namespaces_rwsem); mutex_unlock(&ctrl->sync_io_queue_mutex); sequences respectively.