On 03/02/2021 15:56, Don.Brace@xxxxxxxxxxxxx wrote:
True. However this is 5.12 material, so we shouldn't be bothered by that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure whether smartpqi_fix_host_qdepth_limit would be the solution. You could simply divide can_queue by nr_hw_queues, as suggested before, or even simpler, set nr_hw_queues = 1. How much performance would that cost you? Don: For my HBA disk tests... Dividing can_queue / nr_hw_queues is about a 40% drop. ~380K - 400K IOPS Setting nr_hw_queues = 1 results in a 1.5 X gain in performance. ~980K IOPS
So do you just set shost.nr_hw_queues = 1, yet leave the rest of the driver as is?
Please note that when changing from nr_hw_queues many -> 1, then the default IO scheduler changes from none -> mq-deadline, but I would hope that would not make such a big difference.
Setting host_tagset = 1
For this, v5.11-rc6 has a fix which may affect you (2569063c7140), so please include it
~640K IOPS So, it seem that setting nr_hw_queues = 1 results in the best performance. Is this expected? Would this also be true for the future?
Not expected by me
Thanks, Don Brace Below is my setup. --- [3:0:0:0] disk HP EG0900FBLSK HPD7 /dev/sdd [3:0:1:0] disk HP EG0900FBLSK HPD7 /dev/sde [3:0:2:0] disk HP EG0900FBLSK HPD7 /dev/sdf [3:0:3:0] disk HP EH0300FBQDD HPD5 /dev/sdg [3:0:4:0] disk HP EG0900FDJYR HPD4 /dev/sdh [3:0:5:0] disk HP EG0300FCVBF HPD9 /dev/sdi [3:0:6:0] disk HP EG0900FBLSK HPD7 /dev/sdj [3:0:7:0] disk HP EG0900FBLSK HPD7 /dev/sdk [3:0:8:0] disk HP EG0900FBLSK HPD7 /dev/sdl [3:0:9:0] disk HP MO0200FBRWB HPD9 /dev/sdm [3:0:10:0] disk HP MM0500FBFVQ HPD8 /dev/sdn [3:0:11:0] disk ATA MM0500GBKAK HPGC /dev/sdo [3:0:12:0] disk HP EG0900FBVFQ HPDC /dev/sdp [3:0:13:0] disk HP VO006400JWZJT HP00 /dev/sdq [3:0:14:0] disk HP VO015360JWZJN HP00 /dev/sdr [3:0:15:0] enclosu HP D3700 5.04 - [3:0:16:0] enclosu HP D3700 5.04 - [3:0:17:0] enclosu HPE Smart Adapter 3.00 - [3:1:0:0] disk HPE LOGICAL VOLUME 3.00 /dev/sds [3:2:0:0] storage HPE P408e-p SR Gen10 3.00 - ----- [global] ioengine=libaio ; rw=randwrite ; percentage_random=40 rw=write size=100g bs=4k direct=1 ramp_time=15 ; filename=/mnt/fio_test ; cpus_allowed=0-27 iodepth=4096
I normally use iodepth circa 40 to 128, but then I normally just do rw=read for performance testing
[/dev/sdd] [/dev/sde] [/dev/sdf] [/dev/sdg] [/dev/sdh] [/dev/sdi] [/dev/sdj] [/dev/sdk] [/dev/sdl] [/dev/sdm] [/dev/sdn] [/dev/sdo] [/dev/sdp] [/dev/sdq] [/dev/sdr]