> -----Original Message----- > From: Hannes Reinecke [mailto:hare@xxxxxxx] > Sent: Wednesday, February 15, 2017 3:35 PM > To: Kashyap Desai; Sreekanth Reddy > Cc: Christoph Hellwig; Martin K. Petersen; James Bottomley; linux- > scsi@xxxxxxxxxxxxxxx; Sathya Prakash Veerichetty; PDL-MPT-FUSIONLINUX > Subject: Re: [PATCH 00/10] mpt3sas: full mq support > > On 02/15/2017 10:18 AM, Kashyap Desai wrote: > >> > >> > >> Hannes, > >> > >> Result I have posted last time is with merge operation enabled in > >> block layer. If I disable merge operation then I don't see much > >> improvement with multiple hw request queues. Here is the result, > >> > >> fio results when nr_hw_queues=1, > >> 4k read when numjobs=24: io=248387MB, bw=1655.1MB/s, iops=423905, > >> runt=150003msec > >> > >> fio results when nr_hw_queues=24, > >> 4k read when numjobs=24: io=263904MB, bw=1759.4MB/s, iops=450393, > >> runt=150001msec > > > > Hannes - > > > > I worked with Sreekanth and also understand pros/cons of Patch #10. > > " [PATCH 10/10] mpt3sas: scsi-mq interrupt steering" > > > > In above patch, can_queue of HBA is divided based on logic CPU, it > > means we want to mimic as if mpt3sas HBA support multi queue > > distributing actual resources which is single Submission H/W Queue. > > This approach badly impact many performance areas. > > > > nr_hw_queues = 1 is what I observe as best performance approach since > > it never throttle IO if sdev->queue_depth is set to HBA queue depth. > > In case of nr_hw_queues = "CPUs" throttle IO at SCSI level since we > > never allow more than "updated can_queue" in LLD. > > > True. > And this was actually one of the things I wanted to demonstrate with this > patchset :-) ATM blk-mq really works best when having a distinct tag space > per port/device. As soon as the hardware provides a _shared_ tag space you > end up with tag starvation issues as blk-mq only allows you to do a static > split of the available tagspace. > While this patchset demonstrates that the HBA itself _does_ benefit from > using block-mq (especially on highly parallel loads), it also demonstrates > that > _block-mq_ has issues with singlethreaded loads on this HBA (or, rather, > type of HBA, as I doubt this issue is affecting mpt3sas only). > > > Below code bring actual HBA can_queue very low ( Ea on 96 logical core > > CPU new can_queue goes to 42, if HBA queue depth is 4K). It means we > > will see lots of IO throttling in scsi mid layer due to > > shost->can_queue reach the limit very soon if you have <fio> jobs with > higher QD. > > > > if (ioc->shost->nr_hw_queues > 1) { > > ioc->shost->nr_hw_queues = ioc->msix_vector_count; > > ioc->shost->can_queue /= ioc->msix_vector_count; > > } > > I observe negative performance if I have 8 SSD drives attached to > > Ventura (latest IT controller). 16 fio jobs at QD=128 gives ~1600K > > IOPs and the moment I switch to nr_hw_queues = "CPUs", it gave hardly > > ~850K IOPs. This is mainly because of host_busy stuck at very low ~169 > > on > my setup. > > > Which actually might be an issue with the way scsi is hooked into blk-mq. > The SCSI stack is using 'can_queue' as a check for 'host_busy', ie if the > host is > capable of accepting more commands. > As we're limiting can_queue (to get the per-queue command depth > correctly) we should be using the _overall_ command depth for the > can_queue value itself to make the host_busy check work correctly. > > I've attached a patch for that; can you test if it makes a difference? Hannes - Attached patch works fine for me. FYI - We need to set device queue depth to can_queue as we are currently not doing in mpt3sas driver. With attached patch when I tried, I see ~2-3% improvement running multiple jobs. Single job profile no difference. So looks like we are good to reach performance with single nr_hw_queues. We have some patches to be send so want to know how to rebase this patch series as few patches coming from Broadcom. Can we consider below as plan ? - Patches from 1-7 will be reposted. Also Sreekanth will complete review on existing patch 1-7. - We need blk_tag support only for nr_hw_queue = 1. With that say, we will have many code changes/function without " shost_use_blk_mq" check and assume it is single nr_hw_queue supported <mpt3sas> driver. Ea - Below function can be simplify - just refer tag from scmd->request and don't need check of shost_use_blk_mq + nr_hw_queue etc.. u16 mpt3sas_base_get_smid_scsiio(struct MPT3SAS_ADAPTER *ioc, u8 cb_idx, struct scsi_cmnd *scmd) { struct scsiio_tracker *request; u16 smid = scmd->request->tag + 1; ... return request->smid; } - Later we can explore if nr_hw_queue more than one really add benefit. >From current limited testing, I don't see major performance boost if we have nr_hw_queue more than one. > > > May be as Sreekanth mentioned, performance improvement you have > > observed is due to nomerges=2 is not set and OS will attempt soft > back/front merge. > > > > I debug live machine and understood we never see parallel instance of > > "scsi_dispatch_cmd" as we expect due to can_queue is less. If we > > really has > > *very* large HBA QD, this patch #10 to expose multiple SQ may be useful. > > > As mentioned, the above patch might help here. > The patch actually _reduced_ throughput on my end, as the requests never > stayed long enough in the queue to be merged. Hence I've refrained from > posting it. > But as you're able to test with SSDs this patch really should make a > difference, and certainly should remove the arbitrary stalls due to > host_busy. > > > For now, we are looking for updated version of patch which will only > > keep IT HBA in SQ mode (like we are doing in <megaraid_sas> driver) > > and add interface to use blk_tag in both scsi.mq and !scsi.mq mode. > > Sreekanth has already started working on it, but we may need to check > > full performance test run to post the actual patch. > > May be we can cherry pick few patches from this series and get blk_tag > > support to improve performance of <mpt3sas> later which will not allow > > use to choose nr_hw_queue to be tunable. > > > Sure, no problem with that. > I'll be preparing another submission round, and we can discuss how we go > from there. > > Cheers, > > Hannes > > Thanks, Kashyap > > > > > >> > >> Thanks, > >> Sreekanth > > > -- > Dr. Hannes Reinecke Teamlead Storage & Networking > hare@xxxxxxx +49 911 74053 688 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 > (AG Nürnberg)