On Fri, Apr 28, 2017 at 11:15:36PM +0800, Ming Lei wrote: > When blk-mq I/O scheduler is used, we need two tags for > submitting one request. One is called scheduler tag for > allocating request and scheduling I/O, another one is called > driver tag, which is used for dispatching IO to hardware/driver. > This way introduces one extra per-queue allocation for both tags > and request pool, and may not be as efficient as case of none > scheduler. > > Also currently we put a default per-hctx limit on schedulable > requests, and this limit may be a bottleneck for some devices, > especialy when these devices have a quite big tag space. > > This patch introduces BLK_MQ_F_SCHED_USE_HW_TAG so that we can > allow to use hardware/driver tags directly for IO scheduling if > devices's hardware tag space is big enough. Then we can avoid > the extra resource allocation and make IO submission more > efficient. > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > --- > block/blk-mq-sched.c | 10 +++++++++- > block/blk-mq.c | 35 +++++++++++++++++++++++++++++------ > include/linux/blk-mq.h | 1 + > 3 files changed, 39 insertions(+), 7 deletions(-) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 27c67465f856..45a675f07b8b 100644 > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 0168b27469cb..e530bc54f0d9 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -263,9 +263,19 @@ struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data, > rq->rq_flags = RQF_MQ_INFLIGHT; > atomic_inc(&data->hctx->nr_active); > } > - rq->tag = tag; > - rq->internal_tag = -1; > - data->hctx->tags->rqs[rq->tag] = rq; > + data->hctx->tags->rqs[tag] = rq; > + > + /* > + * If we use hw tag for scheduling, postpone setting > + * rq->tag in blk_mq_get_driver_tag(). > + */ > + if (data->hctx->flags & BLK_MQ_F_SCHED_USE_HW_TAG) { > + rq->tag = -1; > + rq->internal_tag = tag; > + } else { > + rq->tag = tag; > + rq->internal_tag = -1; > + } I'm guessing you did it this way because we currently check rq->tag to decided whether this is a flush that needs to be bypassed? Makes sense, but I'm adding it to my list of reasons why the flush stuff sucks. > @@ -893,9 +909,15 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx, > static void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, > struct request *rq) > { > - blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag); > + unsigned tag = rq->tag; The pickiest of all nits, but we mostly spell out `unsigned int` in this file, it'd be nice to stay consistent.