Hi Ming,
I was looking at some IOMMU issue on a LSI RAID 3008 card, and noticed that
performance there is not what I get on other SAS HBAs - it's lower.
After some debugging and fiddling with sdev queue depth in mpt3sas driver, I
am finding that performance changes appreciably with sdev queue depth:
sdev qdepth fio number jobs* 1 10 20
16 1590 1654 1660
32 1545 1646 1654
64 1436 1085 1070
254 (default) 1436 1070 1050
What does the performance number mean? IOPS or others? What is the fio
io test? random IO or sequential IO?
So those figures are x1K IOPs read performance; so 1590, above, is 1.59M
IOPs read. Here's the fio script:
[global]
rw=read
direct=1
ioengine=libaio
iodepth=40
numjobs=20
bs=4k
;size=10240000m
;zero_buffers=1
group_reporting=1
;ioscheduler=noop
;cpumask=0xffe
;cpus_allowed=1-47
;gtod_reduce=1
;iodepth_batch=2
;iodepth_batch_complete=2
runtime=60
;thread
loops = 10000
fio queue depth is 40, and I'm using 12x SAS SSDs.
I got comparable disparity in results for fio queue depth = 128 and num jobs
= 1:
sdev qdepth fio number jobs* 1
16 1640
32 1618
64 1577
254 (default) 1437
IO sched = none.
That driver also sets queue depth tracking = 1, but never seems to kick in.
So it seems to me that the block layer is merging more bios per request, as
averge sg count per request goes up from 1 - > upto 6 or more. As I see,
when queue depth lowers the only thing that is really changing is that we
fail more often in getting the budget in
scsi_mq_get_budget()->scsi_dev_queue_ready().
Right, the behavior basically doesn't change compared with block legacy
io path. And that is why sdev->queue_depth is a bit important for HDD.
OK
So initial sdev queue depth comes from cmd_per_lun by default or manually
setting in the driver via scsi_change_queue_depth(). It seems to me that
some drivers are not setting this optimally, as above.
Thoughts on guidance for setting sdev queue depth? Could blk-mq changed this
behavior?
So far, the sdev queue depth is provided by SCSI layer, and blk-mq can
queue one request only if budget is obtained via .get_budget().
Well, based on my testing, default sdev queue depth seems too large for
that LLDD ...
Thanks,
John