Re: scsi-mq - tag# and can_queue, performance.

Jens Axboe <axboe@xxxxxxxxx> · Mon, 3 Apr 2017 10:47:37 -0600

On 04/03/2017 10:41 AM, Arun Easi wrote:
> On Mon, 3 Apr 2017, 8:20am, Bart Van Assche wrote:
> 
>> On Mon, 2017-04-03 at 09:29 +0200, Hannes Reinecke wrote:
>>> On 04/03/2017 08:37 AM, Arun Easi wrote:
>>>> If the above is true, then for a LLD to get tag# within it's max-tasks 
>>>> range, it has to report max-tasks / number-of-hw-queues in can_queue, and 
>>>> in the I/O path, use the tag and hwq# to arrive at a index# to use. This, 
>>>> though, leads to a poor use of tag resources -- queue reaching it's 
>>>> capacity while LLD can still take it.
>>>
>>> Shared tag sets continue to dog the block-mq on 'legacy' (ie non-NVMe)
>>> HBAs. ATM the only 'real' solution to this problem is indeed have a
>>> static split of the entire tag space by the number of hardware queues.
>>> With the mentioned tag-starvation problem.
>>
>> Hello Arun and Hannes,
>>
>> Apparently the current blk_mq_alloc_tag_set() implementation is well suited
>> for drivers like NVMe and ib_srp but not for traditional SCSI HBA drivers.
>> How about adding a BLK_MQ_F_* flag that tells __blk_mq_alloc_rq_maps() to
>> allocate a single set of tags for all hardware queues and also to add a flag
>> to struct scsi_host_template such that SCSI LLDs can enable this behavior?
>>
> 
> Hi Bart,
> 
> This would certainly be beneficial in my case. Moreover, it certainly 
> makes sense to move the logic up where multiple drivers can leverage. 
> 
> Perhaps, use percpu_ida* interfaces to do that, but I think I read 
> somewhere that, it is not efficient (enough?) and is the reason to go the 
> current way for block tags.

You don't have to change the underlying tag generation to solve this
problem, Bart already pretty much outlined a fix that would work.
percpu_ida works fine if you never use more than roughly half the
available space, it's a poor fit for request tags where we want to
retain good behavior and scaling at or near tag exhaustion. That's why
blk-mq ended up rolling its own, which is now generically available as
lib/sbitmap.c.

-- 
Jens Axboe