On Thu, Aug 9, 2018 at 10:09 AM, Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> wrote: > Currently, we count the hctx as active after allocate driver tag > successfully. If a previously inactive hctx try to get tag first > time, it may fails and need to wait. However, due to the stale tag > ->active_queues, the other shared-tags users are still able to > occupy all driver tags while there is someone waiting for tag. > Consequently, even if the previously inactive hctx is waked up, it > still may not be able to get a tag and could be starved. > > To fix it, we count the hctx as active before try to allocate driver > tag, then when it is waiting the tag, the other shared-tag users > will reserve budget for it. > > Signed-off-by: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> > --- > > V3: > add more detailed comment > > V2: > only invoke blk_mq_tag_busy w/o io scheduler in blk_mq_get_request > > block/blk-mq-tag.c | 3 +++ > block/blk-mq.c | 8 ++++++-- > 2 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > index 09b2ee6..a8ebcbd 100644 > --- a/block/blk-mq-tag.c > +++ b/block/blk-mq-tag.c > @@ -23,6 +23,9 @@ bool blk_mq_has_free_tags(struct blk_mq_tags *tags) > > /* > * If a previously inactive queue goes active, bump the active user count. > + * We need to do this before try to allocate driver tag, then even if fail > + * to get tag when first time, the other shared-tag users could reserve > + * budget for it. > */ > bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) > { > diff --git a/block/blk-mq.c b/block/blk-mq.c > index ae44e85..75ac3fbd 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -285,7 +285,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data, > rq->tag = -1; > rq->internal_tag = tag; > } else { > - if (blk_mq_tag_busy(data->hctx)) { > + if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) { > rq_flags = RQF_MQ_INFLIGHT; > atomic_inc(&data->hctx->nr_active); > } > @@ -367,6 +367,8 @@ static struct request *blk_mq_get_request(struct request_queue *q, > if (!op_is_flush(op) && e->type->ops.mq.limit_depth && > !(data->flags & BLK_MQ_REQ_RESERVED)) > e->type->ops.mq.limit_depth(op, data); > + } else { > + blk_mq_tag_busy(data->hctx); > } > > tag = blk_mq_get_tag(data); > @@ -972,6 +974,7 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx, > .hctx = blk_mq_map_queue(rq->q, rq->mq_ctx->cpu), > .flags = wait ? 0 : BLK_MQ_REQ_NOWAIT, > }; > + bool shared; > > might_sleep_if(wait); > > @@ -981,9 +984,10 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx, > if (blk_mq_tag_is_reserved(data.hctx->sched_tags, rq->internal_tag)) > data.flags |= BLK_MQ_REQ_RESERVED; > > + shared = blk_mq_tag_busy(data.hctx); > rq->tag = blk_mq_get_tag(&data); > if (rq->tag >= 0) { > - if (blk_mq_tag_busy(data.hctx)) { > + if (shared) { > rq->rq_flags |= RQF_MQ_INFLIGHT; > atomic_inc(&data.hctx->nr_active); > } > -- > 2.7.4 > Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx> Thanks, Ming Lei