Re: [PATCH 00/10] mpt3sas: full mq support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/15/2017 10:18 AM, Kashyap Desai wrote:
>>
>>
>> Hannes,
>>
>> Result I have posted last time is with merge operation enabled in block
>> layer. If I disable merge operation then I don't see much improvement
>> with
>> multiple hw request queues. Here is the result,
>>
>> fio results when nr_hw_queues=1,
>> 4k read when numjobs=24: io=248387MB, bw=1655.1MB/s, iops=423905,
>> runt=150003msec
>>
>> fio results when nr_hw_queues=24,
>> 4k read when numjobs=24: io=263904MB, bw=1759.4MB/s, iops=450393,
>> runt=150001msec
> 
> Hannes -
> 
>  I worked with Sreekanth and also understand pros/cons of Patch #10.
> " [PATCH 10/10] mpt3sas: scsi-mq interrupt steering"
> 
> In above patch, can_queue of HBA is divided based on logic CPU, it means we
> want to mimic as if mpt3sas HBA support multi queue distributing actual
> resources which is single Submission H/W Queue. This approach badly impact
> many performance areas.
> 
> nr_hw_queues = 1 is what I observe as best performance approach since it
> never throttle IO if sdev->queue_depth is set to HBA queue depth.
> In case of nr_hw_queues = "CPUs" throttle IO at SCSI level since we never
> allow more than "updated can_queue" in LLD.
> 
True.
And this was actually one of the things I wanted to demonstrate with
this patchset :-)
ATM blk-mq really works best when having a distinct tag space per
port/device. As soon as the hardware provides a _shared_ tag space you
end up with tag starvation issues as blk-mq only allows you to do a
static split of the available tagspace.
While this patchset demonstrates that the HBA itself _does_ benefit from
using block-mq (especially on highly parallel loads), it also
demonstrates that _block-mq_ has issues with singlethreaded loads on
this HBA (or, rather, type of HBA, as I doubt this issue is affecting
mpt3sas only).

> Below code bring actual HBA can_queue very low ( Ea on 96 logical core CPU
> new can_queue goes to 42, if HBA queue depth is 4K). It means we will see
> lots of IO throttling in scsi mid layer due to shost->can_queue reach the
> limit very soon if you have <fio> jobs with higher QD.
> 
> 	if (ioc->shost->nr_hw_queues > 1) {
> 		ioc->shost->nr_hw_queues = ioc->msix_vector_count;
> 		ioc->shost->can_queue /= ioc->msix_vector_count;
> 	}
> I observe negative performance if I have 8 SSD drives attached to Ventura
> (latest IT controller). 16 fio jobs at QD=128 gives ~1600K IOPs and the
> moment I switch to nr_hw_queues = "CPUs", it gave hardly ~850K IOPs. This is
> mainly because of host_busy stuck at very low ~169 on my setup.
> 
Which actually might be an issue with the way scsi is hooked into blk-mq.
The SCSI stack is using 'can_queue' as a check for 'host_busy', ie if
the host is capable of accepting more commands.
As we're limiting can_queue (to get the per-queue command depth
correctly) we should be using the _overall_ command depth for the
can_queue value itself to make the host_busy check work correctly.

I've attached a patch for that; can you test if it makes a difference?

> May be as Sreekanth mentioned, performance improvement you have observed is
> due to nomerges=2 is not set and OS will attempt soft back/front merge.
> 
> I debug live machine and understood we never see parallel instance of
> "scsi_dispatch_cmd" as we expect due to can_queue is less. If we really has
> *very* large HBA QD, this patch #10 to expose multiple SQ may be useful.
> 
As mentioned, the above patch might help here.
The patch actually _reduced_ throughput on my end, as the requests never
stayed long enough in the queue to be merged. Hence I've refrained from
posting it.
But as you're able to test with SSDs this patch really should make a
difference, and certainly should remove the arbitrary stalls due to
host_busy.

> For now, we are looking for updated version of patch which will only keep IT
> HBA in SQ mode (like we are doing in <megaraid_sas> driver) and add
> interface to use blk_tag in both scsi.mq and !scsi.mq mode.  Sreekanth has
> already started working on it, but we may need to check full performance
> test run to post the actual patch.
> May be we can cherry pick few patches from this series and get blk_tag
> support to improve performance of <mpt3sas> later which will not allow use
> to choose nr_hw_queue to be tunable.
> 
Sure, no problem with that.
I'll be preparing another submission round, and we can discuss how we go
from there.

Cheers,

Hannes
> Thanks, Kashyap
> 
> 
>>
>> Thanks,
>> Sreekanth


-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@xxxxxxx			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
From df424c8618e0b06ded2d978818e6d3df4a54a61d Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@xxxxxxx>
Date: Wed, 15 Feb 2017 10:58:01 +0100
Subject: [PATCH] mpt3sas: implement 'shared_tags' SCSI host flag

If the HBA implements a host-wide tagspace we should be signalling
this to the SCSI layer to avoid 'can_queue' being reduced, thereby
inducing I/O stalls.

Signed-off-by: Hannes Reinecke <hare@xxxxxxxx>
---
 drivers/scsi/mpt3sas/mpt3sas_base.c  | 5 ++---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 ++
 drivers/scsi/scsi_lib.c              | 2 ++
 include/scsi/scsi_host.h             | 5 +++++
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 9e31cae..520aee4 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3544,10 +3544,9 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER *ioc,
 	 */
 	ioc->shost->reserved_cmds = INTERNAL_SCSIIO_CMDS_COUNT;
 	ioc->shost->can_queue = ioc->scsiio_depth - ioc->shost->reserved_cmds;
-	if (ioc->shost->nr_hw_queues > 1) {
+	if (ioc->shost->nr_hw_queues > 1)
 		ioc->shost->nr_hw_queues = ioc->msix_vector_count;
-		ioc->shost->can_queue /= ioc->msix_vector_count;
-	}
+
 	dinitprintk(ioc, pr_info(MPT3SAS_FMT
 		"scsi host: can_queue depth (%d), nr_hw_queues (%d)\n",
 		ioc->name, ioc->shost->can_queue, ioc->shost->nr_hw_queues));
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 14f7a9d..4088e1a 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -8585,6 +8585,7 @@ static int scsih_map_queues(struct Scsi_Host *shost)
 	.shost_attrs			= mpt3sas_host_attrs,
 	.sdev_attrs			= mpt3sas_dev_attrs,
 	.track_queue_depth		= 1,
+	.shared_tags			= 1,
 	.cmd_size			= sizeof(struct scsiio_tracker),
 };
 
@@ -8624,6 +8625,7 @@ static int scsih_map_queues(struct Scsi_Host *shost)
 	.shost_attrs			= mpt3sas_host_attrs,
 	.sdev_attrs			= mpt3sas_dev_attrs,
 	.track_queue_depth		= 1,
+	.shared_tags			= 1,
 	.cmd_size			= sizeof(struct scsiio_tracker),
 };
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7100aaa..6bb06ed 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2143,6 +2143,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.ops = &scsi_mq_ops;
 	shost->tag_set.nr_hw_queues = shost->nr_hw_queues ? : 1;
 	shost->tag_set.queue_depth = shost->can_queue;
+	if (shost->hostt->shared_tags)
+		shost->tag_set.queue_depth /= shost->nr_hw_queues;
 	shost->tag_set.reserved_tags = shost->reserved_cmds;
 	shost->tag_set.cmd_size = cmd_size;
 	shost->tag_set.numa_node = NUMA_NO_NODE;
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index cc83dd6..d344803 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -457,6 +457,11 @@ struct scsi_host_template {
 	unsigned no_async_abort:1;
 
 	/*
+	 * True if the host uses a shared tag space
+	 */
+	unsigned shared_tags:1;
+
+	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
 	unsigned int max_host_blocked;
-- 
1.8.5.6


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux