On Tue, Sep 20, 2022 at 10:17:24AM +0800, Ming Lei wrote: > For avoiding to trigger io timeout when one hctx becomes inactive, we > drain IOs when all CPUs of one hctx are offline. However, driver's > timeout handler may require cpus_read_lock, such as nvme-pci, > pci_alloc_irq_vectors_affinity() is called in nvme-pci reset context, > and irq_build_affinity_masks() needs cpus_read_lock(). > > Meantime when blk-mq's cpuhp offline handler is called, cpus_write_lock > is held, so deadlock is caused. > > Fixes the issue by breaking the wait loop if enough long time elapses, > and these in-flight not drained IO still can be handled by timeout > handler. I'm not sure that this actually is a good idea on its own, and it kinda defeats the cpu hotplug processing. So if I understand your log above correctly the problem is that we have commands that would time out, and we exacalate to a controller reset that is racing with the CPU unplug, or if not can you explain the scenario a little more? > Cc: linux-nvme@xxxxxxxxxxxxxxxxxxx > Reported-by: Yi Zhang <yi.zhang@xxxxxxxxxx> > Fixes: bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline") > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > --- > block/blk-mq.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index c96c8c4f751b..4585985b8537 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -3301,6 +3301,7 @@ static inline bool blk_mq_last_cpu_in_hctx(unsigned int cpu, > return true; > } > > +#define BLK_MQ_MAX_OFFLINE_WAIT_MSECS 3000 > static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node) > { > struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node, > @@ -3326,8 +3327,13 @@ static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node) > * frozen and there are no requests. > */ > if (percpu_ref_tryget(&hctx->queue->q_usage_counter)) { > - while (blk_mq_hctx_has_requests(hctx)) > + unsigned int wait_ms = 0; > + > + while (blk_mq_hctx_has_requests(hctx) && wait_ms < > + BLK_MQ_MAX_OFFLINE_WAIT_MSECS) { > msleep(5); > + wait_ms += 5; > + } > percpu_ref_put(&hctx->queue->q_usage_counter); > } > > -- > 2.31.1 ---end quoted text---