Re: [PATCH 00/10] mpt3sas: full mq support

Hannes Reinecke <hare@xxxxxxx> · Thu, 16 Feb 2017 11:18:42 +0100

On 02/16/2017 10:48 AM, Kashyap Desai wrote:
>> -----Original Message-----
>> From: Hannes Reinecke [mailto:hare@xxxxxxx]
>> Sent: Wednesday, February 15, 2017 3:35 PM
>> To: Kashyap Desai; Sreekanth Reddy
>> Cc: Christoph Hellwig; Martin K. Petersen; James Bottomley; linux-
>> scsi@xxxxxxxxxxxxxxx; Sathya Prakash Veerichetty; PDL-MPT-FUSIONLINUX
>> Subject: Re: [PATCH 00/10] mpt3sas: full mq support
>>
>> On 02/15/2017 10:18 AM, Kashyap Desai wrote:
>>>>
>>>>
>>>> Hannes,
>>>>
>>>> Result I have posted last time is with merge operation enabled in
>>>> block layer. If I disable merge operation then I don't see much
>>>> improvement with multiple hw request queues. Here is the result,
>>>>
>>>> fio results when nr_hw_queues=1,
>>>> 4k read when numjobs=24: io=248387MB, bw=1655.1MB/s, iops=423905,
>>>> runt=150003msec
>>>>
>>>> fio results when nr_hw_queues=24,
>>>> 4k read when numjobs=24: io=263904MB, bw=1759.4MB/s, iops=450393,
>>>> runt=150001msec
>>>
>>> Hannes -
>>>
>>>  I worked with Sreekanth and also understand pros/cons of Patch #10.
>>> " [PATCH 10/10] mpt3sas: scsi-mq interrupt steering"
>>>
>>> In above patch, can_queue of HBA is divided based on logic CPU, it
>>> means we want to mimic as if mpt3sas HBA support multi queue
>>> distributing actual resources which is single Submission H/W Queue.
>>> This approach badly impact many performance areas.
>>>
>>> nr_hw_queues = 1 is what I observe as best performance approach since
>>> it never throttle IO if sdev->queue_depth is set to HBA queue depth.
>>> In case of nr_hw_queues = "CPUs" throttle IO at SCSI level since we
>>> never allow more than "updated can_queue" in LLD.
>>>
>> True.
>> And this was actually one of the things I wanted to demonstrate with this
>> patchset :-) ATM blk-mq really works best when having a distinct tag space
>> per port/device. As soon as the hardware provides a _shared_ tag space you
>> end up with tag starvation issues as blk-mq only allows you to do a static
>> split of the available tagspace.
>> While this patchset demonstrates that the HBA itself _does_ benefit from
>> using block-mq (especially on highly parallel loads), it also demonstrates
>> that
>> _block-mq_ has issues with singlethreaded loads on this HBA (or, rather,
>> type of HBA, as I doubt this issue is affecting mpt3sas only).
>>
>>> Below code bring actual HBA can_queue very low ( Ea on 96 logical core
>>> CPU new can_queue goes to 42, if HBA queue depth is 4K). It means we
>>> will see lots of IO throttling in scsi mid layer due to
>>> shost->can_queue reach the limit very soon if you have <fio> jobs with
>> higher QD.
>>>
>>> 	if (ioc->shost->nr_hw_queues > 1) {
>>> 		ioc->shost->nr_hw_queues = ioc->msix_vector_count;
>>> 		ioc->shost->can_queue /= ioc->msix_vector_count;
>>> 	}
>>> I observe negative performance if I have 8 SSD drives attached to
>>> Ventura (latest IT controller). 16 fio jobs at QD=128 gives ~1600K
>>> IOPs and the moment I switch to nr_hw_queues = "CPUs", it gave hardly
>>> ~850K IOPs. This is mainly because of host_busy stuck at very low ~169
>>> on
>> my setup.
>>>
>> Which actually might be an issue with the way scsi is hooked into blk-mq.
>> The SCSI stack is using 'can_queue' as a check for 'host_busy', ie if the
>> host is
>> capable of accepting more commands.
>> As we're limiting can_queue (to get the per-queue command depth
>> correctly) we should be using the _overall_ command depth for the
>> can_queue value itself to make the host_busy check work correctly.
>>
>> I've attached a patch for that; can you test if it makes a difference?
> Hannes -
> Attached patch works fine for me. FYI -  We need to set device queue depth
> to can_queue as we are currently not doing in mpt3sas driver.
> 
> With attached patch when I tried, I see ~2-3% improvement running multiple
> jobs. Single job profile no difference.
> 
> So looks like we are good to reach performance with single nr_hw_queues.
> 
Whee, cool.

> We have some patches to be send so want to know how to rebase this patch
> series as few patches coming from Broadcom. Can we consider below as plan ?
> 
Sure, can do.

> - Patches from 1-7 will be reposted. Also Sreekanth will complete review on
> existing patch 1-7.
> - We need blk_tag support only for nr_hw_queue = 1.
> 
> With that say, we will have many code changes/function without "
> shost_use_blk_mq" check and assume it is single nr_hw_queue supported
> <mpt3sas> driver.
> 
> Ea - Below function can be simplify - just refer tag from scmd->request and
> don't need check of shost_use_blk_mq + nr_hw_queue etc..
> 
> u16
> mpt3sas_base_get_smid_scsiio(struct MPT3SAS_ADAPTER *ioc, u8 cb_idx,
>         struct scsi_cmnd *scmd)
> {
>         struct scsiio_tracker *request;
>         u16 smid = scmd->request->tag +  1;
>          ...
>          return request->smid;
> }
> 
> - Later we can explore if nr_hw_queue more than one really add benefit.
> From current limited testing, I don't see major performance boost if we have
> nr_hw_queue more than one.
> 
Well, the _actual_ code to support mq is rather trivial, and really
serves as a good testbed for scsi-mq.
I would prefer to leave it in, and disable it via a module parameter.

But in either case, I can rebase the patches to leave any notions of
'nr_hw_queues' to patch 8 for implementing full mq support.

And we need to discuss how to handle MPI2_FUNCTION_SCSI_IO_REQUEST; the
current method doesn't work with blk-mq.
I really would like to see that go, especially as sg/bsg supports the
same functionality ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@xxxxxxx			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)