Hi John, It is reported inside RH that CPU utilization is increased ~20% when running simple FIO test inside VM which disk is built on image stored on XFS/megaraid_sas. When I try to investigate by reproducing the issue via scsi_debug, I found IO hang when running randread IO(8k, direct IO, libaio) on scsi_debug disk created by the following command: modprobe scsi_debug host_max_queue=128 submit_queues=$NR_CPUS virtual_gb=256 Looks it is caused by SCHED_RESTART because current RESTART is just done on current hctx, and we may need to restart all hctxs for shared tags, and the issue can be fixed by the append patch. However, IOPS drops more than 10% with the patch. So any idea for this issue and the original performance drop? diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index e1e997af89a0..45188f7aa789 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -59,10 +59,18 @@ EXPORT_SYMBOL_GPL(blk_mq_sched_mark_restart_hctx); void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) { + bool shared_tag = blk_mq_is_sbitmap_shared(hctx->flags); + + if (shared_tag) + blk_mq_run_hw_queues(hctx->queue, true); + if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state)) return; clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state); + if (shared_tag) + return; + /* * Order clearing SCHED_RESTART and list_empty_careful(&hctx->dispatch) * in blk_mq_run_hw_queue(). Its pair is the barrier in Thanks, Ming