On Tue, Sep 19, 2017 at 06:42:30PM +0000, Bart Van Assche wrote: > On Wed, 2017-09-20 at 00:55 +0800, Ming Lei wrote: > > On Wed, Sep 20, 2017 at 12:49 AM, Bart Van Assche > > <Bart.VanAssche@xxxxxxx> wrote: > > > On Wed, 2017-09-20 at 00:04 +0800, Ming Lei wrote: > > > > Run queue at end_io is definitely wrong, because blk-mq has SCHED_RESTART > > > > to do that already. > > > > > > Sorry but I disagree. If SCHED_RESTART is set that causes the blk-mq core to > > > reexamine the software queues and the hctx dispatch list but not the requeue > > > list. If a block driver returns BLK_STS_RESOURCE then requests end up on the > > > requeue list. Hence the following code in scsi_end_request(): > > > > That doesn't need SCHED_RESTART, because it is requeue's > > responsibility to do that, > > see blk_mq_requeue_work(), which will run hw queue at the end of this func. > > That's not what I was trying to explain. What I was trying to explain is that > every block driver that can cause a request to end up on the requeue list is > responsible for kicking the requeue list at a later time. Hence the > kblockd_schedule_work(&sdev->requeue_work) call in the SCSI core and the > blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() calls in the > dm code. What I would like to see is measurement results for dm-mpath without > this patch series and a call to kick the requeue list added to the dm-mpath > end_io code. For this issue, it isn't same between SCSI and dm-rq. We don't need to run queue in .end_io of dm, and the theory is simple, otherwise it isn't performance issue, and should be I/O hang. 1) every dm-rq's request is 1:1 mapped to SCSI's request 2) if there is any mapped SCSI request not finished, either in-flight or in requeue list or whatever, there will be one corresponding dm-rq's request in-flight 3) once the mapped SCSI request is completed, dm-rq's completion path will be triggered and dm-rq's queue will be rerun because of SCHED_RESTART in dm-rq So the hw queue of dm-rq has been run in dm-rq's completion path already, right? Why do we need to do it again in the hot path? -- Ming