Decouple the number of tags available from the number of hardware queues by sharing a single blk_mq_tags amongst all hardware queues. When storage latency is relatively high, having too many tags available can harm the performance of mixed workloads. By sharing blk_mq_tags amongst hardware queues, nr_requests can be set to the appropriate number of tags for the device. Signed-off-by: Melanie Plageman <melanieplageman@xxxxxxxxx> --- As an example, on a 16-core VM coupled with a 1 TiB storage device having a combined (VM + disk) max BW of 200 MB/s and IOPS of 5000, configured with 16 hardware queues and with nr_requests set to 56 and queue_depth set to 15, the following fio job description illustrates the benefit of hardware queues sharing blk_mq_tags: [global] time_based=1 ioengine=io_uring direct=1 runtime=60 [read_hogs] bs=16k iodepth=10000 rw=randread filesize=10G numjobs=15 directory=/mnt/test [wal] bs=8k iodepth=3 filesize=4G rw=write numjobs=1 directory=/mnt/test with hctx_share_tags set, the "wal" job does 271 IOPS, averaging 13120 usec completion latency and the "read_hogs" jobs average around 4700 IOPS. without hctx_share_tags set, the "wal" job does 85 IOPS and averages around 45308 usec completion latency and the "read_hogs" job average around 4900 IOPS. Note that reducing nr_requests to a number sufficient to increase WAL IOPS results in unacceptably low IOPS for the random reads when only one random read job is running. drivers/scsi/storvsc_drv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 0ed764bcabab..5048e7fcf959 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1997,6 +1997,7 @@ static struct scsi_host_template scsi_driver = { .track_queue_depth = 1, .change_queue_depth = storvsc_change_queue_depth, .per_device_tag_set = 1, + .hctx_share_tags = 1, }; enum { -- 2.25.1