Re: About scsi device queue depth

John Garry <john.garry@xxxxxxxxxx> · Tue, 12 Jan 2021 08:56:45 +0000

Hi Ming,

I was looking at some IOMMU issue on a LSI RAID 3008 card, and noticed that
performance there is not what I get on other SAS HBAs - it's lower.

After some debugging and fiddling with sdev queue depth in mpt3sas driver, I
am finding that performance changes appreciably with sdev queue depth:

sdev qdepth	fio number jobs* 	1	10	20
16					1590	1654	1660
32					1545	1646	1654
64					1436	1085	1070
254 (default)				1436	1070	1050

What does the performance number mean? IOPS or others? What is the fio
io test? random IO or sequential IO?

So those figures are x1K IOPs read performance; so 1590, above, is 1.59M 
IOPs read. Here's the fio script:

[global]
rw=read
direct=1
ioengine=libaio
iodepth=40
numjobs=20
bs=4k
;size=10240000m
;zero_buffers=1
group_reporting=1
;ioscheduler=noop
;cpumask=0xffe
;cpus_allowed=1-47
;gtod_reduce=1
;iodepth_batch=2
;iodepth_batch_complete=2
runtime=60
;thread
loops = 10000

fio queue depth is 40, and I'm using 12x SAS SSDs.

I got comparable disparity in results for fio queue depth = 128 and num jobs
= 1:

sdev qdepth	fio number jobs* 	1	
16					1640
32					1618	
64					1577	
254 (default)				1437	

IO sched = none.

That driver also sets queue depth tracking = 1, but never seems to kick in.

So it seems to me that the block layer is merging more bios per request, as
averge sg count per request goes up from 1 - > upto 6 or more. As I see,
when queue depth lowers the only thing that is really changing is that we
fail more often in getting the budget in
scsi_mq_get_budget()->scsi_dev_queue_ready().

Right, the behavior basically doesn't change compared with block legacy
io path. And that is why sdev->queue_depth is a bit important for HDD.

OK

So initial sdev queue depth comes from cmd_per_lun by default or manually
setting in the driver via scsi_change_queue_depth(). It seems to me that
some drivers are not setting this optimally, as above.

Thoughts on guidance for setting sdev queue depth? Could blk-mq changed this
behavior?

So far, the sdev queue depth is provided by SCSI layer, and blk-mq can
queue one request only if budget is obtained via .get_budget().

Well, based on my testing, default sdev queue depth seems too large for 
that LLDD ...

Thanks,
John