If .queue_rq() returns BLK_STS_RESOURCE, blk-mq will rerun the queue in the three situations: 1) if BLK_MQ_S_SCHED_RESTART is set - queue is rerun after one rq is completed, see blk_mq_sched_restart() which is run from blk_mq_free_request() 2) run out of driver tag - queue is rerun after one tag is freed 3) otherwise - queue is run immediately in blk_mq_dispatch_rq_list() This random dealy of running hw queue is introduced by commit 6077c2d706097c0 (dm rq: Avoid that request processing stalls sporadically), which claimed one request processing stalling is fixed, but never explained the behind idea, and it is a workaound at most. Even the question isn't explained by anyone in recent discussion. Also calling blk_mq_delay_run_hw_queue() inside .queue_rq() is a horrible hack because it makes BLK_MQ_S_SCHED_RESTART not working, and degrades I/O peformance a lot. Finally this patch makes sure that dm-rq returns BLK_STS_RESOURCE to blk-mq only when underlying queue is out of resource, so we switch to return DM_MAPIO_DELAY_REQUEU if either MPATHF_QUEUE_IO or MPATHF_PG_INIT_REQUIRED is set in multipath_clone_and_map(). Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> --- drivers/md/dm-mpath.c | 5 ++--- drivers/md/dm-rq.c | 1 - 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index f7810cc869ac..86bf502a8e51 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -516,9 +516,8 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq, return DM_MAPIO_KILL; } else if (test_bit(MPATHF_QUEUE_IO, &m->flags) || test_bit(MPATHF_PG_INIT_REQUIRED, &m->flags)) { - if (pg_init_all_paths(m)) - return DM_MAPIO_DELAY_REQUEUE; - return DM_MAPIO_REQUEUE; + pg_init_all_paths(m); + return DM_MAPIO_DELAY_REQUEUE; } memset(mpio, 0, sizeof(*mpio)); diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index 3b319776d80c..4d157b14d302 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -755,7 +755,6 @@ static blk_status_t dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx, /* Undo dm_start_request() before requeuing */ rq_end_stats(md, rq); rq_completed(md, rq_data_dir(rq), false); - blk_mq_delay_run_hw_queue(hctx, 100/*ms*/); return BLK_STS_RESOURCE; } -- 2.9.5