On Fri, 2017-04-28 at 12:22 -0600, Jens Axboe wrote: > On 04/28/2017 09:15 AM, Ming Lei wrote: > > +/* > > + * If this queue has enough hardware tags and doesn't share tags with > > + * other queues, just use hw tag directly for scheduling. > > + */ > > +static inline bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) > > +{ > > + if (q->tag_set->flags & BLK_MQ_F_TAG_SHARED) > > + return false; > > + > > + if (blk_mq_get_queue_depth(q) < q->nr_requests) > > + return false; > > I think we should leave a bigger gap. Ideally, for scheduling, we should > have a hw queue depth that's around half of what the scheduler has to > work with. That will always leave us something to schedule with, if the > hw can't deplete the whole pool. Hello Jens, The scsi-mq core allocates exactly the same number of tags per hardware queue as the SCSI queue depth. Requiring that there is a gap would cause BLK_MQ_F_SCHED_USE_HW_TAG not to be enabled for any scsi-mq LLD. I'm not sure that changing the tag allocation strategy in scsi-mq would be the best solution. How about changing blk_mq_sched_may_use_hw_tag() into something like the below to guarantee that the scheduler has sufficient tags available? static bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) { return blk_mq_get_queue_depth(q) >= max(q->nr_requests, 16); } Thanks, Bart.