Re: [PATCH V2] scsi: core: only re-run queue in scsi_end_request() if device queue is busy

Ming Lei <ming.lei@xxxxxxxxxx> · Tue, 19 Nov 2019 10:20:58 +0800

On Mon, Nov 18, 2019 at 03:40:06PM -0800, Bart Van Assche wrote:
> On 11/18/19 2:06 AM, Ming Lei wrote:
> > Now the request queue is run in scsi_end_request() unconditionally if both
> > target queue and host queue is ready. We should have re-run request queue
> > only after this device queue becomes busy for restarting this LUN only.
> > 
> > Recently Long Li reported that cost of run queue may be very heavy in
> > case of high queue depth. So improve this situation by only running
> > the request queue when this LUN is busy.
> > 
> > Cc: Jens Axboe <axboe@xxxxxxxxx>
> > Cc: Ewan D. Milne <emilne@xxxxxxxxxx>
> > Cc: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx>
> > Cc: Hannes Reinecke <hare@xxxxxxx>
> > Cc: Bart Van Assche <bvanassche@xxxxxxx>
> > Cc: Damien Le Moal <damien.lemoal@xxxxxxx>
> > Cc: Long Li <longli@xxxxxxxxxxxxx>
> > Cc: linux-block@xxxxxxxxxxxxxxx
> > Reported-by: Long Li <longli@xxxxxxxxxxxxx>
> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > ---
> > V2:
> > 	- commit log change, no any code change
> > 	- add reported-by tag
> > 
> > 
> >   drivers/scsi/scsi_lib.c    | 29 +++++++++++++++++++++++++++--
> >   include/scsi/scsi_device.h |  1 +
> >   2 files changed, 28 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 379533ce8661..62a86a82c38d 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -612,7 +612,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
> >   	if (scsi_target(sdev)->single_lun ||
> >   	    !list_empty(&sdev->host->starved_list))
> >   		kblockd_schedule_work(&sdev->requeue_work);
> > -	else
> > +	else if (READ_ONCE(sdev->restart))
> >   		blk_mq_run_hw_queues(q, true);
> >   	percpu_ref_put(&q->q_usage_counter);
> > @@ -1632,8 +1632,33 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
> >   	struct request_queue *q = hctx->queue;
> >   	struct scsi_device *sdev = q->queuedata;
> > -	if (scsi_dev_queue_ready(q, sdev))
> > +	if (scsi_dev_queue_ready(q, sdev)) {
> > +		WRITE_ONCE(sdev->restart, 0);
> >   		return true;
> > +	}
> > +
> > +	/*
> > +	 * If all in-flight requests originated from this LUN are completed
> > +	 * before setting .restart, sdev->device_busy will be observed as
> > +	 * zero, then blk_mq_delay_run_hw_queue() will dispatch this request
> > +	 * soon. Otherwise, completion of one of these request will observe
> > +	 * the .restart flag, and the request queue will be run for handling
> > +	 * this request, see scsi_end_request().
> > +	 *
> > +	 * However, the .restart flag may be cleared from other dispatch code
> > +	 * path after one inflight request is completed, then:
> > +	 *
> > +	 * 1) if this request is dispatched from scheduler queue or sw queue one
> > +	 * by one, this request will be handled in that dispatch path too given
> > +	 * the request still stays at scheduler/sw queue when calling .get_budget()
> > +	 * callback.
> > +	 *
> > +	 * 2) if this request is dispatched from hctx->dispatch or
> > +	 * blk_mq_flush_busy_ctxs(), this request will be put into hctx->dispatch
> > +	 * list soon, and blk-mq will be responsible for covering it, see
> > +	 * blk_mq_dispatch_rq_list().
> > +	 */
> > +	WRITE_ONCE(sdev->restart, 1);
> 
> Hi Ming,
> 
> Are any memory barriers needed?
> 
> Should WRITE_ONCE(sdev->restart, 1) perhaps be moved above the
> scsi_dev_queue_ready()? Consider e.g. the following scenario:
> 
> sdev->restart == 0
> 
> scsi_mq_get_budget() calls scsi_dev_queue_ready() and that last function
> returns false.
> 
> scsi_end_request() calls __blk_mq_end_request()
> scsi_end_request() skips the blk_mq_run_hw_queues() call

Suppose the sdev->restart isn't set as 1 or isn't visible by
scsi_end_request().

> 
> scsi_mq_get_budget() changes sdev->restart into 1.

As the comment mentioned, if there isn't any in-flight requests
originated from this LUN, blk_mq_delay_run_hw_queue() in
scsi_mq_get_budget() will run the hw queue. If there is any
in-flight requests from this LUN, that request's scsi_end_request()
will handle that.

Then looks one barrier is required between 'WRITE_ONCE(sdev->restart, 1)'
and 'atomic_read(&sdev->device_busy) == 0'.

And its pair is scsi_device_unbusy() and READ_ONCE(sdev->restart).
The barrier between the pair could be implied by __blk_mq_end_request(),
either __blk_mq_free_request() or rq->end_io.

> 
> Can this race happen with the above patch applied? Will this scenario result
> in a queue stall?

If barrier is added between 'WRITE_ONCE(sdev->restart, 1)' and
'atomic_read(&sdev->device_busy) == 0', the race should be avoided.

Will do that in V3.

Thanks,
Ming