On 26/04/2021 17:03, Ming Lei wrote:
For both hostwide and non-hostwide tags, we have standalone sched tags and
request pool per hctx when q->nr_hw_queues > 1.
driver tags is shared for hostwide tags.
That is why you observe that scheduler tag exhaustion
is easy to trigger in case of non-hostwide tags.
I'd suggest to add one per-request-queue sched tags, and make all hctxs
sharing it, just like what you did for driver tag.
That sounds reasonable.
But I don't see how this is related to hostwide tags specifically, but
rather just having q->nr_hw_queues > 1, which NVMe PCI and some other SCSI
MQ HBAs have (without using hostwide tags).
Before hostwide tags, the whole scheduler queue depth should be 256.
After hostwide tags, the whole scheduler queue depth becomes 256 *
nr_hw_queues. But the driver tag queue depth is_not_ changed.
Fine.
More requests come and are tried to dispatch to LLD and can't succeed
because of limited driver tag depth, and CPU utilization could be increased.
Right, maybe this is a problem.
I quickly added some debug, and see that
__blk_mq_get_driver_tag()->__sbitmap_queue_get() fails ~7% for hostwide
tags and 3% for non-hostwide tags.
Having it fail at all for non-hostwide tags seems a bit dubious...
here's the code for deciding the rq sched tag depth:
q->nr_requests = 2 * min(q->tags_set->queue_depth [128], BLK_DEV_MAX_RQ
[128])
So we get 256 for our test scenario, which is appreciably bigger than
q->tags_set->queue_depth, so the failures make sense.
Anyway, I'll look at adding code for a per-request queue sched tags to
see if it helps. But I would plan to continue to use a per hctx sched
request pool.
Thanks,
John