Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early

John Garry <john.garry@xxxxxxxxxx> · Tue, 23 Feb 2021 08:57:49 +0000

On 22/02/2021 14:23, Roger Willcocks wrote:
FYI we have exactly this issue on a machine here running CentOS 8.3 (kernel 4.18.0-240.1.1) (so presumably this happens in RHEL 8 too.)

Controller is MSCC / Adaptec 3154-8i16e driving 60 x 12TB HGST drives configured as five x twelve-drive raid-6, software striped using md, and formatted with xfs.

Test software writes to the array using multiple threads in parallel.

The smartpqi driver would report controller offline within ten minutes or so, with status code 0x6100c

Changed the driver to set 'nr_hw_queues = 1’ and then tested by filling the array with random files (which took a couple of days), which completed fine, so it looks like that one-line change fixes it.

That just makes the driver single-queue.

As such, since the driver uses blk_mq_unique_tag_to_hwq(), only hw queue 
#0 will ever be used in the driver.

And then, since the driver still spreads MSI-X interrupt vectors over 
all CPUs [from pci_alloc_vectors(PCI_IRQ_AFFINITY)], if CPUs associated 
with HW queue #0 are offlined (probably just cpu0), there is no CPUs 
available to service queue #0 interrupt. That's what I think would 
happen, from a quick glance at the code.

Would, of course, be helpful if this was back-ported.

—
Roger

On 3 Feb 2021, at 15:56, Don.Brace@xxxxxxxxxxxxx wrote:

-----Original Message-----
From: Martin Wilck [mailto:mwilck@xxxxxxxx]
Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early

Confirmed my suspicions - it looks like the host is sent more commands
than it can handle. We would need many disks to see this issue though,
which you have.

So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and
I suppose it could be simply fixed by setting .host_tagset in scsi
host template there.

Thanks,
John
--
Don: Even though this works for current kernels, what would chances of
this getting back-ported to 5.9 or even further?

Otherwise the original patch smartpqi_fix_host_qdepth_limit would
correct this issue for older kernels.

True. However this is 5.12 material, so we shouldn't be bothered by that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure whether smartpqi_fix_host_qdepth_limit would be the solution.
You could simply divide can_queue by nr_hw_queues, as suggested before, or even simpler, set nr_hw_queues = 1.

How much performance would that cost you?

Don: For my HBA disk tests...

Dividing can_queue / nr_hw_queues is about a 40% drop.
~380K - 400K IOPS
Setting nr_hw_queues = 1 results in a 1.5 X gain in performance.
~980K IOPS
Setting host_tagset = 1
~640K IOPS

So, it seem that setting nr_hw_queues = 1 results in the best performance.

Is this expected? Would this also be true for the future?

Thanks,
Don Brace

Below is my setup.
---
[3:0:0:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdd
[3:0:1:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sde
[3:0:2:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdf
[3:0:3:0]    disk    HP       EH0300FBQDD      HPD5  /dev/sdg
[3:0:4:0]    disk    HP       EG0900FDJYR      HPD4  /dev/sdh
[3:0:5:0]    disk    HP       EG0300FCVBF      HPD9  /dev/sdi
[3:0:6:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdj
[3:0:7:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdk
[3:0:8:0]    disk    HP       EG0900FBLSK      HPD7  /dev/sdl
[3:0:9:0]    disk    HP       MO0200FBRWB      HPD9  /dev/sdm
[3:0:10:0]   disk    HP       MM0500FBFVQ      HPD8  /dev/sdn
[3:0:11:0]   disk    ATA      MM0500GBKAK      HPGC  /dev/sdo
[3:0:12:0]   disk    HP       EG0900FBVFQ      HPDC  /dev/sdp
[3:0:13:0]   disk    HP       VO006400JWZJT    HP00  /dev/sdq
[3:0:14:0]   disk    HP       VO015360JWZJN    HP00  /dev/sdr
[3:0:15:0]   enclosu HP       D3700            5.04  -
[3:0:16:0]   enclosu HP       D3700            5.04  -
[3:0:17:0]   enclosu HPE      Smart Adapter    3.00  -
[3:1:0:0]    disk    HPE      LOGICAL VOLUME   3.00  /dev/sds
[3:2:0:0]    storage HPE      P408e-p SR Gen10 3.00  -
-----
[global]
ioengine=libaio
; rw=randwrite
; percentage_random=40
rw=write
size=100g
bs=4k
direct=1
ramp_time=15
; filename=/mnt/fio_test
; cpus_allowed=0-27
iodepth=4096

[/dev/sdd]
[/dev/sde]
[/dev/sdf]
[/dev/sdg]
[/dev/sdh]
[/dev/sdi]
[/dev/sdj]
[/dev/sdk]
[/dev/sdl]
[/dev/sdm]
[/dev/sdn]
[/dev/sdo]
[/dev/sdp]
[/dev/sdq]
[/dev/sdr]

Distribution kernels would be yet another issue, distros can backport host_tagset and get rid of the issue.

Regards
Martin

.