Re: [PATCH RFC v7 10/12] megaraid_sas: switch fusion adapters to MQ

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 6 Aug 2020 21:38:19 +0800

On Thu, Aug 06, 2020 at 03:55:50PM +0530, Kashyap Desai wrote:
> > > Ming -
> > >
> > > I noted your comments.
> > >
> > > I have completed testing and this particular latest performance issue
> > > on Volume is outstanding.
> > > Currently it is 20-25% performance drop in IOPs and we want that to be
> > > closed before shared host tag is enabled for <megaraid_sas> driver.
> > > Just for my understanding - What will be the next steps on this ?
> > >
> > > I can validate any new approach/patch for this issue.
> > >
> >
> > Hello,
> >
> > What do you think of the following patch?
> 
> I tested this patch. I still see IO hang.
> 
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index
> > c866a4f33871..49f0fc5c7a63 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -552,8 +552,24 @@ static void scsi_run_queue_async(struct scsi_device
> > *sdev)
> >  	if (scsi_target(sdev)->single_lun ||
> >  	    !list_empty(&sdev->host->starved_list))
> >  		kblockd_schedule_work(&sdev->requeue_work);
> > -	else
> > -		blk_mq_run_hw_queues(sdev->request_queue, true);
> > +	else {
> > +		/*
> > +		 * smp_mb() implied in either rq->end_io or
> > blk_mq_free_request
> > +		 * is for ordering writing .device_busy in
> scsi_device_unbusy()
> > +		 * and reading sdev->restarts.
> > +		 */
> > +		int old = atomic_read(&sdev->restarts);
> > +
> > +		if (old) {
> > +			blk_mq_run_hw_queues(sdev->request_queue, true);
> > +
> > +			/*
> > +			 * ->restarts has to be kept as non-zero if there
> is
> > +			 *  new budget contention comes.
> > +			 */
> > +			atomic_cmpxchg(&sdev->restarts, old, 0);
> > +		}
> > +	}
> >  }
> >
> >  /* Returns false when no more bytes to process, true if there are more
> */
> > @@ -1612,8 +1628,34 @@ static void scsi_mq_put_budget(struct
> > request_queue *q)  static bool scsi_mq_get_budget(struct request_queue
> *q)
> > {
> >  	struct scsi_device *sdev = q->queuedata;
> > +	int ret = scsi_dev_queue_ready(q, sdev);
> >
> > -	return scsi_dev_queue_ready(q, sdev);
> > +	if (ret)
> > +		return true;
> > +
> > +	/*
> > +	 * If all in-flight requests originated from this LUN are
> completed
> > +	 * before setting .restarts, sdev->device_busy will be observed as
> > +	 * zero, then blk_mq_delay_run_hw_queue() will dispatch this
> request
> > +	 * soon. Otherwise, completion of one of these request will
> observe
> > +	 * the .restarts flag, and the request queue will be run for
> handling
> > +	 * this request, see scsi_end_request().
> > +	 */
> > +	atomic_inc(&sdev->restarts);
> > +
> > +	/*
> > +	 * Order writing .restarts and reading .device_busy, and make sure
> > +	 * .restarts is visible to scsi_end_request(). Its pair is implied
> by
> > +	 * __blk_mq_end_request() in scsi_end_request() for ordering
> > +	 * writing .device_busy in scsi_device_unbusy() and reading
> .restarts.
> > +	 *
> > +	 */
> > +	smp_mb__after_atomic();
> > +
> > +	if (unlikely(atomic_read(&sdev->device_busy) == 0 &&
> > +				!scsi_device_blocked(sdev)))
> > +		blk_mq_delay_run_hw_queues(sdev->request_queue,
> > SCSI_QUEUE_DELAY);
> 
> Hi Ming -
> 
> There is still some race which is not handled.  Take a case of IO is not
> able to get budget and it has already marked <restarts> flag.
> <restarts> flag will be seen non-zero in completion path and completion
> path will attempt h/w queue run. (But this particular IO is still not in
> s/w queue.).
> Attempt of running h/w queue from completion path will not flush any IO
> since there is no IO in s/w queue.

Then where is the IO to be submitted in case of running out of budget?

Any IO request which is going to be added to hctx->dispatch, the queue will be
re-run via blk-mq core.

Any IO request being issued directly when running out of budget will be
insert to hctx->dispatch or sw/scheduler queue, will be run in the
submission path.

Thanks, 
Ming