Re: [RFC PATCH V4 2/2] scsi: core: don't limit per-LUN queue depth for SSD

Ming Lei <ming.lei@xxxxxxxxxx> · Tue, 5 Nov 2019 08:23:34 +0800

On Mon, Nov 04, 2019 at 03:00:47PM +0530, Kashyap Desai wrote:
> > On Fri, Oct 25, 2019 at 03:34:16PM +0530, Kashyap Desai wrote:
> > > > >
> > > > > >
> > > > > > > Can we get supporting API from block layer (through SML)  ?
> > > > > > > something similar to "atomic_read(&hctx->nr_active)" which can
> > > > > > > be derived from
> > > > > > > sdev->request_queue->hctx ?
> > > > > > > At least for those driver which is nr_hw_queue = 1, it will be
> > > > > > > useful and we can avoid sdev->device_busy dependency.
> > > > > >
> > > > > > If you mean to add new atomic counter, we just move the
> > > > > > .device_busy
> > > > > into
> > > > > > blk-mq, that can become new bottleneck.
> > > > >
> > > > > How about below ? We define and use below API instead of
> > > > > "atomic_read(&scp->device->device_busy) >" and it is giving
> > > > > expected value. I have not captured performance impact on max IOPs
> > profile.
> > > > >
> > > > > Inline unsigned long sdev_nr_inflight_request(struct request_queue
> > > > > *q) {
> > > > >         struct blk_mq_hw_ctx *hctx;
> > > > >         unsigned long nr_requests = 0;
> > > > >         int i;
> > > > >
> > > > >         queue_for_each_hw_ctx(q, hctx, i)
> > > > >                 nr_requests += atomic_read(&hctx->nr_active);
> > > > >
> > > > >         return nr_requests;
> > > > > }
> > > >
> > > > There is still difference between above and .device_busy in case of
> > > none,
> > > > because .nr_active is accounted actually when allocating the request
> > > instead
> > > > of getting driver tag(or before calling .queue_rq).
> > >
> > >
> > > This will be fine as long as we get outstanding from allocation time
> > > itself.
> >
> > Fine, but keep that in mind.
> >
> > > >
> > > > Also the above only works in case that there are more than one
> > > > active
> > > LUNs.
> > >
> > > I am not able to understand this part. We have tested on setup which
> > > has only one active LUN and it works. Can you help me to understand
> > > this part ?
> >
> > Please see blk_mq_rq_ctx_init():
> >
> >    if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) {
> >                         rq_flags = RQF_MQ_INFLIGHT;
> .
> >    }
> >
> > blk_mq_init_allocated_queue
> > 	blk_mq_add_queue_tag_set
> > 		blk_mq_update_tag_set_depth(ture)
> > 			queue_set_hctx_shared(q, shared)
> >
> 
> Ming, Thanks for the pointers. Now I am able to follow you.  Only single
> active LUN does not really require shared tag, so block layer starts using
> BLK_MQ_F_TAG_SHARED only after more than one active LUN.
> This limitation should be fine for megaraid_sas and mpt3sas driver. BTW,
> how about using  BLK_MQ_F_TAG_SHARED flags for one active lun case ?

I guess it won't be accepted, given .nr_active is used for fair driver
tag allocation among all LUNs, and the counting does has cost.

> It will help us to remove single lun limitation, so that any other driver
> module can take benefit of the same API.
> 
> I think we have to provide API  " sdev_nr_inflight_request" from block
> layer OR scsi mid layer.
> For this RFC, we need additional API discussed in this thread so that
> megaraid_sas and mpt3sas driver does not break key functionality which has
> dependency on sdev-> device_busy.

I will take a close look at this usage, and see if there is better way.

Thanks,
Ming