On Fri, Apr 14, 2017 at 05:12:50PM +0000, Bart Van Assche wrote: > On Fri, 2017-04-14 at 09:13 +0800, Ming Lei wrote: > > On Thu, Apr 13, 2017 at 09:59:57AM -0700, Bart Van Assche wrote: > > > On 04/12/17 19:20, Ming Lei wrote: > > > > On Wed, Apr 12, 2017 at 06:38:07PM +0000, Bart Van Assche wrote: > > > > > If the blk-mq core would always rerun a hardware queue if a block driver > > > > > returns BLK_MQ_RQ_QUEUE_BUSY then that would cause 100% of a single CPU core > > > > > > > > It won't casue 100% CPU utilization since we restart queue in completion > > > > path and at that time at least one tag is available, then progress can be > > > > made. > > > > > > Hello Ming, > > > > > > Sorry but you are wrong. If .queue_rq() returns BLK_MQ_RQ_QUEUE_BUSY > > > then it's likely that calling .queue_rq() again after only a few > > > microseconds will cause it to return BLK_MQ_RQ_QUEUE_BUSY again. If you > > > don't believe me, change "if (!blk_mq_sched_needs_restart(hctx) && > > > !test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state)) blk_mq_run_hw_queue(hctx, > > > true);" into "blk_mq_run_hw_queue(hctx, true);", trigger a busy > > > > Yes, that can be true, but I mean it is still OK to run the queue again > > with > > > > if (!blk_mq_sched_needs_restart(hctx) && > > !test_bit(BLK_MQ_S_TAG_WAITING, &hctx->state)) > > blk_mq_run_hw_queue(hctx, true); > > > > and restarting queue in __blk_mq_finish_request() when > > BLK_MQ_RQ_QUEUE_BUSY is returned from .queue_rq(). And both are in current > > blk-mq implementation. > > > > Then why do we need blk_mq_delay_run_hw_queue(hctx, 100/*ms*/) in dm? > > Because if dm_mq_queue_rq() returns BLK_MQ_RQ_QUEUE_BUSY that there is no > guarantee that __blk_mq_finish_request() will be called later on for the > same queue. dm_mq_queue_rq() can e.g. return BLK_MQ_RQ_QUEUE_BUSY while no > dm requests are in progress because the SCSI error handler is active for > all underlying paths. See also scsi_lld_busy() and scsi_host_in_recovery(). OK, thanks Bart for the explanation. Looks a very interesting BLK_MQ_RQ_QUEUE_BUSY case which isn't casued by too many pending I/O, and will study more about this case. Thanks, Ming