On Thu, Nov 21, 2019 at 11:26:34AM +0800, Ming Lei wrote: > Now the request queue is run in scsi_end_request() unconditionally if both > target queue and host queue is ready. We should have re-run request queue > only after this device queue becomes busy for restarting this LUN only. > > Recently Long Li reported that cost of run queue may be very heavy in > case of high queue depth. So improve this situation by only running > the request queue when this LUN is busy. > > Cc: Jens Axboe <axboe@xxxxxxxxx> > Cc: Ewan D. Milne <emilne@xxxxxxxxxx> > Cc: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> > Cc: Hannes Reinecke <hare@xxxxxxx> > Cc: Bart Van Assche <bvanassche@xxxxxxx> > Cc: Damien Le Moal <damien.lemoal@xxxxxxx> > Cc: Long Li <longli@xxxxxxxxxxxxx> > Reported-by: Long Li <longli@xxxxxxxxxxxxx> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > --- > V3: > - add one smp_mb() in scsi_mq_get_budget() and comment > > V2: > - commit log change, no any code change > - add reported-by tag > > > drivers/scsi/scsi_lib.c | 43 ++++++++++++++++++++++++++++++++++++-- > include/scsi/scsi_device.h | 1 + > 2 files changed, 42 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 379533ce8661..d3d237a09a78 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -607,12 +607,17 @@ static bool scsi_end_request(struct request *req, blk_status_t error, > */ > percpu_ref_get(&q->q_usage_counter); > > + /* > + * One smp_mb() is implied by either rq->end_io or > + * blk_mq_free_request for ordering writing .device_busy in > + * scsi_device_unbusy() and reading sdev->restart. > + */ > __blk_mq_end_request(req, error); > > if (scsi_target(sdev)->single_lun || > !list_empty(&sdev->host->starved_list)) > kblockd_schedule_work(&sdev->requeue_work); > - else > + else if (READ_ONCE(sdev->restart)) > blk_mq_run_hw_queues(q, true); > > percpu_ref_put(&q->q_usage_counter); > @@ -1632,8 +1637,42 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx) > struct request_queue *q = hctx->queue; > struct scsi_device *sdev = q->queuedata; > > - if (scsi_dev_queue_ready(q, sdev)) > + if (scsi_dev_queue_ready(q, sdev)) { > + WRITE_ONCE(sdev->restart, 0); > return true; > + } > + > + /* > + * If all in-flight requests originated from this LUN are completed > + * before setting .restart, sdev->device_busy will be observed as > + * zero, then blk_mq_delay_run_hw_queue() will dispatch this request > + * soon. Otherwise, completion of one of these request will observe > + * the .restart flag, and the request queue will be run for handling > + * this request, see scsi_end_request(). > + * > + * However, the .restart flag may be cleared from other dispatch code > + * path after one inflight request is completed, then: > + * > + * 1) if this request is dispatched from scheduler queue or sw queue one > + * by one, this request will be handled in that dispatch path too given > + * the request still stays at scheduler/sw queue when calling .get_budget() > + * callback. > + * > + * 2) if this request is dispatched from hctx->dispatch or > + * blk_mq_flush_busy_ctxs(), this request will be put into hctx->dispatch > + * list soon, and blk-mq will be responsible for covering it, see > + * blk_mq_dispatch_rq_list(). > + */ > + WRITE_ONCE(sdev->restart, 1); > + > + /* > + * Order writting .restart and reading .device_busy, and make sure > + * .restart is visible to scsi_end_request(). Its pair is implied by > + * __blk_mq_end_request() in scsi_end_request() for ordering > + * writing .device_busy in scsi_device_unbusy() and reading .restart. > + * > + */ > + smp_mb(); > > if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev)) > blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY); > diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h > index 202f4d6a4342..9d8ca662ae86 100644 > --- a/include/scsi/scsi_device.h > +++ b/include/scsi/scsi_device.h > @@ -109,6 +109,7 @@ struct scsi_device { > atomic_t device_busy; /* commands actually active on LLDD */ > atomic_t device_blocked; /* Device returned QUEUE_FULL. */ > > + unsigned int restart; > spinlock_t list_lock; > struct list_head cmd_list; /* queue of in use SCSI Command structures */ > struct list_head starved_entry; > -- > 2.20.1 > Ping... Thanks, Ming