On Wed, Oct 13, 2021 at 12:11:12PM +0100, John Garry wrote: > > > > blk_mq_queue_tag_busy_iter() needn't such change? >> I didn't > > > > think so.>>>> blk_mq_queue_tag_busy_iter() will indeed > re-iter the tags per hctx. However > > > in bt_iter(), we check rq->mq_hctx == hctx for calling the iter callback: > > > > > > static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) > > > { > > > ... > > > > > > if (rq->q == hctx->queue && rq->mq_hctx == hctx) > > > ret = iter_data->fn(hctx, rq, iter_data->data, reserved); > > > > > > And this would only pass for the correct hctx which we're iter'ing for. > > It is true for both shared and non-shared sbitmap since we don't share > > hctx, so what does matter? > > It matters that we are doing the right thing for shared tags. My point is we > iter but don't call the callback unless the correct hctx. > > As I see, this has not changed in transitioning from shared sbitmap to > shared tags. > > > With single shared tags, you can iterate over > > all requests originated from all hw queues, right? > > > Right, for the same request queue, we should do that. > > > > Indeed, it would be nice not to iter excessive times, but I didn't see a > > > straightforward way to change that. > > > > In Kashyap's report, the lock contention is actually from > > blk_mq_queue_tag_busy_iter(), see: > > > > https://lore.kernel.org/linux-block/8867352d-2107-1f8a-0f1c-ef73450bf256@xxxxxxxxxx/ > > > > As I understand, Kashyap mentioned no throughput regression with my series, > but just higher cpu usage in blk_mq_find_and_get_req(). > > I'll see if I can see such a thing in my setup. > > But could it be that since we only have a single sets of requests per > tagset, and not a set of requests per HW queue, there is more contention on > the common set of requests in the refcount_inc_not_zero() call ***, below: > > static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags, > unsigned int bitnr) > { > ... > > rq = tags->rqs[bitnr]; > if (... || !refcount_inc_not_zero(&rq->ref)) *** > ... > } Kashyap's log shows that contention on tags->lock is increased, that should be caused by nr_hw_queues iterating. blk_mq_find_and_get_req() will be run nr_hw_queue times compared with pre-shared-sbitmap, since it is done before checking rq->mq_hctx. > > But I wonder why this function is even called often... > > > > There is also blk_mq_all_tag_iter(): > > > > > > void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, > > > void *priv) > > > { > > > __blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS); > > > } > > > > > > But then the only user is blk_mq_hctx_has_requests(): > > > > > > static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx) > > > { > > > struct blk_mq_tags *tags = hctx->sched_tags ? > > > hctx->sched_tags : hctx->tags; > > > struct rq_iter_data data = { > > > .hctx = hctx, > > > }; > > > > > > blk_mq_all_tag_iter(tags, blk_mq_has_request, &data); > > > return data.has_rq; > > > } > > This above one only iterates over the specified hctx/tags, it won't be > > affected. > > > > > But, again like bt_iter(), blk_mq_has_request() will check the hctx matches: > > Not see what matters wrt. checking hctx. > > I'm just saying that something like the following would be broken for shared > tags: > > static bool blk_mq_has_request(struct request *rq, void *data, bool > reserved) > { > struct rq_iter_data *iter_data = data; > > iter_data->has_rq = true; > return true; > } > > static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx) > { > struct rq_iter_data data = { > }; > > blk_mq_all_tag_iter(tags, blk_mq_has_request, &data); > return data.has_rq; > } > > As it ignores that we want to check for a specific hctx. No, that isn't what I meant, follows the change I suggested: diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 72a2724a4eee..2a2ad6dfcc33 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -232,8 +232,9 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) if (!rq) return true; - if (rq->q == hctx->queue && rq->mq_hctx == hctx) - ret = iter_data->fn(hctx, rq, iter_data->data, reserved); + if (rq->q == hctx->queue && (rq->mq_hctx == hctx || + blk_mq_is_shared_tags(hctx->flags))) + ret = iter_data->fn(rq->mq_hctx, rq, iter_data->data, reserved); blk_mq_put_rq_ref(rq); return ret; } @@ -460,6 +461,9 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, if (tags->nr_reserved_tags) bt_for_each(hctx, &tags->breserved_tags, fn, priv, true); bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false); + + if (blk_mq_is_shared_tags(hctx->flags)) + break; } blk_queue_exit(q); } Thanks, Ming