On Mon, Apr 26, 2021 at 06:02:31PM +0100, John Garry wrote: > On 26/04/2021 17:03, Ming Lei wrote: > > > For both hostwide and non-hostwide tags, we have standalone sched tags and > > > request pool per hctx when q->nr_hw_queues > 1. > > driver tags is shared for hostwide tags. > > > > > > That is why you observe that scheduler tag exhaustion > > > > is easy to trigger in case of non-hostwide tags. > > > > > > > > I'd suggest to add one per-request-queue sched tags, and make all hctxs > > > > sharing it, just like what you did for driver tag. > > > > > > > That sounds reasonable. > > > > > > But I don't see how this is related to hostwide tags specifically, but > > > rather just having q->nr_hw_queues > 1, which NVMe PCI and some other SCSI > > > MQ HBAs have (without using hostwide tags). > > Before hostwide tags, the whole scheduler queue depth should be 256. > > After hostwide tags, the whole scheduler queue depth becomes 256 * > > nr_hw_queues. But the driver tag queue depth is_not_ changed. > > Fine. > > > > > More requests come and are tried to dispatch to LLD and can't succeed > > because of limited driver tag depth, and CPU utilization could be increased. > > Right, maybe this is a problem. > > I quickly added some debug, and see that > __blk_mq_get_driver_tag()->__sbitmap_queue_get() fails ~7% for hostwide tags > and 3% for non-hostwide tags. > > Having it fail at all for non-hostwide tags seems a bit dubious... here's > the code for deciding the rq sched tag depth: > > q->nr_requests = 2 * min(q->tags_set->queue_depth [128], BLK_DEV_MAX_RQ > [128]) > > So we get 256 for our test scenario, which is appreciably bigger than > q->tags_set->queue_depth, so the failures make sense. > > Anyway, I'll look at adding code for a per-request queue sched tags to see > if it helps. But I would plan to continue to use a per hctx sched request > pool. Why not switch to per hctx sched request pool? Thanks, Ming