Re: [PATCH] blk-mq: avoid to hang in the cpuhp offline handler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20/09/2022 03:17, Ming Lei wrote:
For avoiding to trigger io timeout when one hctx becomes inactive, we
drain IOs when all CPUs of one hctx are offline. However, driver's
timeout handler may require cpus_read_lock, such as nvme-pci,
pci_alloc_irq_vectors_affinity() is called in nvme-pci reset context,
and irq_build_affinity_masks() needs cpus_read_lock().

Meantime when blk-mq's cpuhp offline handler is called, cpus_write_lock
is held, so deadlock is caused.

Fixes the issue by breaking the wait loop if enough long time elapses,
and these in-flight not drained IO still can be handled by timeout
handler.

I don't think that that this is a good idea - that is because often drivers cannot safely handle scenario of timeout of an IO which has actually completed. NVMe timeout handler may poll for completion, but SCSI does not.

Indeed, if we were going to allow the timeout handler handle these in-flight IO then there is no point in having this hotplug handler in the first place.


Cc: linux-nvme@xxxxxxxxxxxxxxxxxxx
Reported-by: Yi Zhang <yi.zhang@xxxxxxxxxx>
Fixes: bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline")
Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
---
  block/blk-mq.c | 8 +++++++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c96c8c4f751b..4585985b8537 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3301,6 +3301,7 @@ static inline bool blk_mq_last_cpu_in_hctx(unsigned int cpu,
  	return true;
  }
+#define BLK_MQ_MAX_OFFLINE_WAIT_MSECS 3000

That's so low compared to default SCSI sd timeout.

  static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
  {
  	struct blk_mq_hw_ctx *hctx = hlist_entry_safe(node,
@@ -3326,8 +3327,13 @@ static int blk_mq_hctx_notify_offline(unsigned int cpu, struct hlist_node *node)
  	 * frozen and there are no requests.
  	 */
  	if (percpu_ref_tryget(&hctx->queue->q_usage_counter)) {
-		while (blk_mq_hctx_has_requests(hctx))
+		unsigned int wait_ms = 0;
+
+		while (blk_mq_hctx_has_requests(hctx) && wait_ms <
+				BLK_MQ_MAX_OFFLINE_WAIT_MSECS) {
  			msleep(5);
+			wait_ms += 5;
+		}
  		percpu_ref_put(&hctx->queue->q_usage_counter);
  	}

Thanks,
John



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux