On Mon, Apr 26, 2021 at 04:52:28PM +0100, John Garry wrote: > On 26/04/2021 15:48, Ming Lei wrote: > > > --0.56%--sbitmap_get > > > > > > I don't see this for hostwide tags - this may be because we have multiple > > > hctx, and the IO sched tags are per hctx, so less chance of exhaustion. But > > > this is not from hostwide tags specifically, but for multiple HW queues in > > > general. As I understood, sched tags were meant to be per request queue, > > > right? I am reading this correctly? > > sched tags is still per-hctx. > > > > I just found that you didn't change sched tags into per-request-queue > > shared tags. Then for hostwide tags, each hctx still has its own > > standalone sched tags and request pool, that is one big difference with > > non hostwide tags. > > For both hostwide and non-hostwide tags, we have standalone sched tags and > request pool per hctx when q->nr_hw_queues > 1. driver tags is shared for hostwide tags. > > > That is why you observe that scheduler tag exhaustion > > is easy to trigger in case of non-hostwide tags. > > > > I'd suggest to add one per-request-queue sched tags, and make all hctxs > > sharing it, just like what you did for driver tag. > > > > That sounds reasonable. > > But I don't see how this is related to hostwide tags specifically, but > rather just having q->nr_hw_queues > 1, which NVMe PCI and some other SCSI > MQ HBAs have (without using hostwide tags). Before hostwide tags, the whole scheduler queue depth should be 256. After hostwide tags, the whole scheduler queue depth becomes 256 * nr_hw_queues. But the driver tag queue depth is _not_ changed. More requests come and are tried to dispatch to LLD and can't succeed because of limited driver tag depth, and CPU utilization could be increased. Thanks, Ming