For avoiding to trigger io timeout when one hctx becomes inactive, we drain IOs when all CPUs of one hctx are offline. However, driver's timeout handler may require cpus_read_lock, such as nvme-pci, pci_alloc_irq_vectors_affinity() is called in nvme-pci reset context, and irq_build_affinity_masks() needs cpus_read_lock(). Meantime when blk-mq's cpuhp offline handler is called, cpus_write_lock is held, so deadlock is caused. Fixes the issue by breaking the wait loop if enough long time elapses, and these in-flight not drained IO still can be handled by timeout handler. Cc: linux-nvme@xxxxxxxxxxxxxxxxxxx Reported-by: Yi Zhang <yi.zhang@xxxxxxxxxx> Fixes: bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline") Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> --- block/blk-mq.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index c96c8c4f751b..4585985b8537 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3301,6 +3301,7 @@ static inline bool blk_mq_last_cpu_in_hctx(unsigned int cpu, return true; } +#define BLK_MQ_MAX_OFFLINE_WAIT_MSECS 3000 static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node) { struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node, @@ -3326,8 +3327,13 @@ static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node) * frozen and there are no requests. */ if (percpu_ref_tryget(&hctx->queue->q_usage_counter)) { - while (blk_mq_hctx_has_requests(hctx)) + unsigned int wait_ms = 0; + + while (blk_mq_hctx_has_requests(hctx) && wait_ms < + BLK_MQ_MAX_OFFLINE_WAIT_MSECS) { msleep(5); + wait_ms += 5; + } percpu_ref_put(&hctx->queue->q_usage_counter); } -- 2.31.1