On 04/03/2017 10:41 AM, Arun Easi wrote: > On Mon, 3 Apr 2017, 8:20am, Bart Van Assche wrote: > >> On Mon, 2017-04-03 at 09:29 +0200, Hannes Reinecke wrote: >>> On 04/03/2017 08:37 AM, Arun Easi wrote: >>>> If the above is true, then for a LLD to get tag# within it's max-tasks >>>> range, it has to report max-tasks / number-of-hw-queues in can_queue, and >>>> in the I/O path, use the tag and hwq# to arrive at a index# to use. This, >>>> though, leads to a poor use of tag resources -- queue reaching it's >>>> capacity while LLD can still take it. >>> >>> Shared tag sets continue to dog the block-mq on 'legacy' (ie non-NVMe) >>> HBAs. ATM the only 'real' solution to this problem is indeed have a >>> static split of the entire tag space by the number of hardware queues. >>> With the mentioned tag-starvation problem. >> >> Hello Arun and Hannes, >> >> Apparently the current blk_mq_alloc_tag_set() implementation is well suited >> for drivers like NVMe and ib_srp but not for traditional SCSI HBA drivers. >> How about adding a BLK_MQ_F_* flag that tells __blk_mq_alloc_rq_maps() to >> allocate a single set of tags for all hardware queues and also to add a flag >> to struct scsi_host_template such that SCSI LLDs can enable this behavior? >> > > Hi Bart, > > This would certainly be beneficial in my case. Moreover, it certainly > makes sense to move the logic up where multiple drivers can leverage. > > Perhaps, use percpu_ida* interfaces to do that, but I think I read > somewhere that, it is not efficient (enough?) and is the reason to go the > current way for block tags. You don't have to change the underlying tag generation to solve this problem, Bart already pretty much outlined a fix that would work. percpu_ida works fine if you never use more than roughly half the available space, it's a poor fit for request tags where we want to retain good behavior and scaling at or near tag exhaustion. That's why blk-mq ended up rolling its own, which is now generically available as lib/sbitmap.c. -- Jens Axboe