> > > > > > > > > Can we get supporting API from block layer (through SML) ? > > > > something similar to "atomic_read(&hctx->nr_active)" which can be > > > > derived from > > > > sdev->request_queue->hctx ? > > > > At least for those driver which is nr_hw_queue = 1, it will be > > > > useful and we can avoid sdev->device_busy dependency. > > > > > > If you mean to add new atomic counter, we just move the .device_busy > > into > > > blk-mq, that can become new bottleneck. > > > > How about below ? We define and use below API instead of > > "atomic_read(&scp->device->device_busy) >" and it is giving expected > > value. I have not captured performance impact on max IOPs profile. > > > > Inline unsigned long sdev_nr_inflight_request(struct request_queue *q) > > { > > struct blk_mq_hw_ctx *hctx; > > unsigned long nr_requests = 0; > > int i; > > > > queue_for_each_hw_ctx(q, hctx, i) > > nr_requests += atomic_read(&hctx->nr_active); > > > > return nr_requests; > > } > > There is still difference between above and .device_busy in case of none, > because .nr_active is accounted actually when allocating the request instead > of getting driver tag(or before calling .queue_rq). This will be fine as long as we get outstanding from allocation time itself. > > Also the above only works in case that there are more than one active LUNs. I am not able to understand this part. We have tested on setup which has only one active LUN and it works. Can you help me to understand this part ? > > If you don't need it in case of single LUN AND don't care the difference in case > of none, the above API looks fine. > > Thanks, > Ming