Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags

John Garry <john.garry@xxxxxxxxxx> · Wed, 13 Oct 2021 12:11:12 +0100

blk_mq_queue_tag_busy_iter() needn't such change? >> I didn't think so.>>>> blk_mq_queue_tag_busy_iter() will indeed 
re-iter the tags per hctx. However
in bt_iter(), we check rq->mq_hctx == hctx for calling the iter callback:

static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
{
	...

	if (rq->q == hctx->queue && rq->mq_hctx == hctx)
		ret = iter_data->fn(hctx, rq, iter_data->data, reserved);

And this would only pass for the correct hctx which we're iter'ing for.
It is true for both shared and non-shared sbitmap since we don't share
hctx, so what does matter?

It matters that we are doing the right thing for shared tags. My point 
is we iter but don't call the callback unless the correct hctx.

As I see, this has not changed in transitioning from shared sbitmap to 
shared tags.

With single shared tags, you can iterate over
all requests originated from all hw queues, right?

Right, for the same request queue, we should do that.

Indeed, it would be nice not to iter excessive times, but I didn't see a
straightforward way to change that.

In Kashyap's report, the lock contention is actually from
blk_mq_queue_tag_busy_iter(), see:

https://lore.kernel.org/linux-block/8867352d-2107-1f8a-0f1c-ef73450bf256@xxxxxxxxxx/

As I understand, Kashyap mentioned no throughput regression with my 
series, but just higher cpu usage in blk_mq_find_and_get_req().

I'll see if I can see such a thing in my setup.

But could it be that since we only have a single sets of requests per 
tagset, and not a set of requests per HW queue, there is more contention 
on the common set of requests in the refcount_inc_not_zero() call ***, 
below:

static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
unsigned int bitnr)
{
	...

	rq = tags->rqs[bitnr];
	if (... || !refcount_inc_not_zero(&rq->ref)) ***
	...
}

But I wonder why this function is even called often...

There is also blk_mq_all_tag_iter():

void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
		void *priv)
{
	__blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
}

But then the only user is blk_mq_hctx_has_requests():

static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
{
	struct blk_mq_tags *tags = hctx->sched_tags ?
			hctx->sched_tags : hctx->tags;
	struct rq_iter_data data = {
		.hctx	= hctx,
	};

	blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
	return data.has_rq;
}
This above one only iterates over the specified hctx/tags, it won't be
affected.

But, again like bt_iter(), blk_mq_has_request() will check the hctx matches:
Not see what matters wrt. checking hctx.

I'm just saying that something like the following would be broken for 
shared tags:

static bool blk_mq_has_request(struct request *rq, void *data, bool 
reserved)
{
	struct rq_iter_data *iter_data = data;

	iter_data->has_rq = true;
	return true;
}

static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
{
	struct rq_iter_data data = {
	};

	blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
	return data.has_rq;
}

As it ignores that we want to check for a specific hctx.

Thanks,
John