On Mon, Nov 18, 2019 at 03:40:06PM -0800, Bart Van Assche wrote: > On 11/18/19 2:06 AM, Ming Lei wrote: > > Now the request queue is run in scsi_end_request() unconditionally if both > > target queue and host queue is ready. We should have re-run request queue > > only after this device queue becomes busy for restarting this LUN only. > > > > Recently Long Li reported that cost of run queue may be very heavy in > > case of high queue depth. So improve this situation by only running > > the request queue when this LUN is busy. > > > > Cc: Jens Axboe <axboe@xxxxxxxxx> > > Cc: Ewan D. Milne <emilne@xxxxxxxxxx> > > Cc: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> > > Cc: Hannes Reinecke <hare@xxxxxxx> > > Cc: Bart Van Assche <bvanassche@xxxxxxx> > > Cc: Damien Le Moal <damien.lemoal@xxxxxxx> > > Cc: Long Li <longli@xxxxxxxxxxxxx> > > Cc: linux-block@xxxxxxxxxxxxxxx > > Reported-by: Long Li <longli@xxxxxxxxxxxxx> > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > --- > > V2: > > - commit log change, no any code change > > - add reported-by tag > > > > > > drivers/scsi/scsi_lib.c | 29 +++++++++++++++++++++++++++-- > > include/scsi/scsi_device.h | 1 + > > 2 files changed, 28 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 379533ce8661..62a86a82c38d 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -612,7 +612,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error, > > if (scsi_target(sdev)->single_lun || > > !list_empty(&sdev->host->starved_list)) > > kblockd_schedule_work(&sdev->requeue_work); > > - else > > + else if (READ_ONCE(sdev->restart)) > > blk_mq_run_hw_queues(q, true); > > percpu_ref_put(&q->q_usage_counter); > > @@ -1632,8 +1632,33 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) > > struct request_queue *q = hctx->queue; > > struct scsi_device *sdev = q->queuedata; > > - if (scsi_dev_queue_ready(q, sdev)) > > + if (scsi_dev_queue_ready(q, sdev)) { > > + WRITE_ONCE(sdev->restart, 0); > > return true; > > + } > > + > > + /* > > + * If all in-flight requests originated from this LUN are completed > > + * before setting .restart, sdev->device_busy will be observed as > > + * zero, then blk_mq_delay_run_hw_queue() will dispatch this request > > + * soon. Otherwise, completion of one of these request will observe > > + * the .restart flag, and the request queue will be run for handling > > + * this request, see scsi_end_request(). > > + * > > + * However, the .restart flag may be cleared from other dispatch code > > + * path after one inflight request is completed, then: > > + * > > + * 1) if this request is dispatched from scheduler queue or sw queue one > > + * by one, this request will be handled in that dispatch path too given > > + * the request still stays at scheduler/sw queue when calling .get_budget() > > + * callback. > > + * > > + * 2) if this request is dispatched from hctx->dispatch or > > + * blk_mq_flush_busy_ctxs(), this request will be put into hctx->dispatch > > + * list soon, and blk-mq will be responsible for covering it, see > > + * blk_mq_dispatch_rq_list(). > > + */ > > + WRITE_ONCE(sdev->restart, 1); > > Hi Ming, > > Are any memory barriers needed? > > Should WRITE_ONCE(sdev->restart, 1) perhaps be moved above the > scsi_dev_queue_ready()? Consider e.g. the following scenario: > > sdev->restart == 0 > > scsi_mq_get_budget() calls scsi_dev_queue_ready() and that last function > returns false. > > scsi_end_request() calls __blk_mq_end_request() > scsi_end_request() skips the blk_mq_run_hw_queues() call Suppose the sdev->restart isn't set as 1 or isn't visible by scsi_end_request(). > > scsi_mq_get_budget() changes sdev->restart into 1. As the comment mentioned, if there isn't any in-flight requests originated from this LUN, blk_mq_delay_run_hw_queue() in scsi_mq_get_budget() will run the hw queue. If there is any in-flight requests from this LUN, that request's scsi_end_request() will handle that. Then looks one barrier is required between 'WRITE_ONCE(sdev->restart, 1)' and 'atomic_read(&sdev->device_busy) == 0'. And its pair is scsi_device_unbusy() and READ_ONCE(sdev->restart). The barrier between the pair could be implied by __blk_mq_end_request(), either __blk_mq_free_request() or rq->end_io. > > Can this race happen with the above patch applied? Will this scenario result > in a queue stall? If barrier is added between 'WRITE_ONCE(sdev->restart, 1)' and 'atomic_read(&sdev->device_busy) == 0', the race should be avoided. Will do that in V3. Thanks, Ming