[bug report] shared tags causes IO hang and performance drop

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 14 Apr 2021 15:50:39 +0800

Hi John,

It is reported inside RH that CPU utilization is increased ~20% when
running simple FIO test inside VM which disk is built on image stored
on XFS/megaraid_sas.

When I try to investigate by reproducing the issue via scsi_debug, I found
IO hang when running randread IO(8k, direct IO, libaio) on scsi_debug disk
created by the following command:

	modprobe scsi_debug host_max_queue=128 submit_queues=$NR_CPUS virtual_gb=256

Looks it is caused by SCHED_RESTART because current RESTART is just done
on current hctx, and we may need to restart all hctxs for shared tags, and the
issue can be fixed by the append patch. However, IOPS drops more than 10% with
the patch.

So any idea for this issue and the original performance drop?

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index e1e997af89a0..45188f7aa789 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -59,10 +59,18 @@ EXPORT_SYMBOL_GPL(blk_mq_sched_mark_restart_hctx);
 
 void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
 {
+	bool shared_tag = blk_mq_is_sbitmap_shared(hctx->flags);
+
+	if (shared_tag)
+		blk_mq_run_hw_queues(hctx->queue, true);
+
 	if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
 		return;
 	clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 
+	if (shared_tag)
+		return;
+
 	/*
 	 * Order clearing SCHED_RESTART and list_empty_careful(&hctx->dispatch)
 	 * in blk_mq_run_hw_queue(). Its pair is the barrier in

Thanks, 
Ming