On Thu, Feb 9, 2017 at 6:42 PM, Hannes Reinecke <hare@xxxxxxx> wrote: > On 02/09/2017 02:03 PM, Sreekanth Reddy wrote: >> On Wed, Feb 1, 2017 at 1:13 PM, Hannes Reinecke <hare@xxxxxxx> wrote: >>> >>> On 02/01/2017 08:07 AM, Kashyap Desai wrote: >>>>> >>>>> -----Original Message----- >>>>> From: Hannes Reinecke [mailto:hare@xxxxxxx] >>>>> Sent: Wednesday, February 01, 2017 12:21 PM >>>>> To: Kashyap Desai; Christoph Hellwig >>>>> Cc: Martin K. Petersen; James Bottomley; linux-scsi@xxxxxxxxxxxxxxx; >>>>> Sathya >>>>> Prakash Veerichetty; PDL-MPT-FUSIONLINUX; Sreekanth Reddy >>>>> Subject: Re: [PATCH 00/10] mpt3sas: full mq support >>>>> >>>>> On 01/31/2017 06:54 PM, Kashyap Desai wrote: >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Hannes Reinecke [mailto:hare@xxxxxxx] >>>>>>> Sent: Tuesday, January 31, 2017 4:47 PM >>>>>>> To: Christoph Hellwig >>>>>>> Cc: Martin K. Petersen; James Bottomley; linux-scsi@xxxxxxxxxxxxxxx; >>>>>> >>>>>> Sathya >>>>>>> >>>>>>> Prakash; Kashyap Desai; mpt-fusionlinux.pdl@xxxxxxxxxxxx >>>>>>> Subject: Re: [PATCH 00/10] mpt3sas: full mq support >>>>>>> >>>>>>> On 01/31/2017 11:02 AM, Christoph Hellwig wrote: >>>>>>>> >>>>>>>> On Tue, Jan 31, 2017 at 10:25:50AM +0100, Hannes Reinecke wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> this is a patchset to enable full multiqueue support for the >>>>>>>>> mpt3sas >>>>>>> >>>>>>> driver. >>>>>>>>> >>>>>>>>> While the HBA only has a single mailbox register for submitting >>>>>>>>> commands, it does have individual receive queues per MSI-X >>>>>>>>> interrupt and as such does benefit from converting it to full >>>>>>>>> multiqueue >>>>>> >>>>>> support. >>>>>>>> >>>>>>>> >>>>>>>> Explanation and numbers on why this would be beneficial, please. >>>>>>>> We should not need multiple submissions queues for a single register >>>>>>>> to benefit from multiple completion queues. >>>>>>>> >>>>>>> Well, the actual throughput very strongly depends on the blk-mq-sched >>>>>>> patches from Jens. >>>>>>> As this is barely finished I didn't post any numbers yet. >>>>>>> >>>>>>> However: >>>>>>> With multiqueue support: >>>>>>> 4k seq read : io=60573MB, bw=1009.2MB/s, iops=258353, runt= >>>>> >>>>> 60021msec >>>>>>> >>>>>>> With scsi-mq on 1 queue: >>>>>>> 4k seq read : io=17369MB, bw=296291KB/s, iops=74072, runt= 60028msec >>>>>>> So yes, there _is_ a benefit. >> >> >> Hannes, >> >> I have created a md raid0 with 4 SAS SSD drives using below command, >> #mdadm --create /dev/md0 --level=0 --raid-devices=4 /dev/sdg /dev/sdh >> /dev/sdi /dev/sdj >> >> And here is 'mdadm --detail /dev/md0' command output, >> -------------------------------------------------------------------------------------------------------------------------- >> /dev/md0: >> Version : 1.2 >> Creation Time : Thu Feb 9 14:38:47 2017 >> Raid Level : raid0 >> Array Size : 780918784 (744.74 GiB 799.66 GB) >> Raid Devices : 4 >> Total Devices : 4 >> Persistence : Superblock is persistent >> >> Update Time : Thu Feb 9 14:38:47 2017 >> State : clean >> Active Devices : 4 >> Working Devices : 4 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Chunk Size : 512K >> >> Name : host_name >> UUID : b63f9da7:b7de9a25:6a46ca00:42214e22 >> Events : 0 >> >> Number Major Minor RaidDevice State >> 0 8 96 0 active sync /dev/sdg >> 1 8 112 1 active sync /dev/sdh >> 2 8 144 2 active sync /dev/sdj >> 3 8 128 3 active sync /dev/sdi >> ------------------------------------------------------------------------------------------------------------------------------ >> >> Then I have used below fio profile to run 4K sequence read operations >> with nr_hw_queues=1 driver and with nr_hw_queues=24 driver (as my >> system has two numa node and each with 12 cpus). >> ----------------------------------------------------- >> global] >> ioengine=libaio >> group_reporting >> direct=1 >> rw=read >> bs=4k >> allow_mounted_write=0 >> iodepth=128 >> runtime=150s >> >> [job1] >> filename=/dev/md0 >> ----------------------------------------------------- >> >> Here are the fio results when nr_hw_queues=1 (i.e. single request >> queue) with various number of job counts >> 1JOB 4k read : io=213268MB, bw=1421.8MB/s, iops=363975, runt=150001msec >> 2JOBs 4k read : io=309605MB, bw=2064.2MB/s, iops=528389, runt=150001msec >> 4JOBs 4k read : io=281001MB, bw=1873.4MB/s, iops=479569, runt=150002msec >> 8JOBs 4k read : io=236297MB, bw=1575.2MB/s, iops=403236, runt=150016msec >> >> Here are the fio results when nr_hw_queues=24 (i.e. multiple request >> queue) with various number of job counts >> 1JOB 4k read : io=95194MB, bw=649852KB/s, iops=162463, runt=150001msec >> 2JOBs 4k read : io=189343MB, bw=1262.3MB/s, iops=323142, runt=150001msec >> 4JOBs 4k read : io=314832MB, bw=2098.9MB/s, iops=537309, runt=150001msec >> 8JOBs 4k read : io=277015MB, bw=1846.8MB/s, iops=472769, runt=150001msec >> >> Here we can see that on less number of jobs count, single request >> queue (nr_hw_queues=1) is giving more IOPs than multi request >> queues(nr_hw_queues=24). >> >> Can you please share your fio profile, so that I can try same thing on >> my system. >> > Have you tried with the latest git update from Jens for-4.11/block (or > for-4.11/next) branch? I am using below git repo, https://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git/log/?h=4.11/scsi-queue Today I will try with Jens for-4.11/block. > I've found that using the mq-deadline scheduler has a noticeable > performance boost. > > The fio job I'm using is essentially the same; you just should make sure > to specify a 'numjob=' statement in there. > Otherwise fio will just use a single CPU, which of course leads to > averse effects in the multiqueue case. Yes I am providing 'numjob=' on fio command line as shown below, # fio md_fio_profile --numjobs=8 --output=fio_results.txt Thanks, Sreekanth > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Teamlead Storage & Networking > hare@xxxxxxx +49 911 74053 688 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton > HRB 21284 (AG Nürnberg)