> -----Original Message----- > From: linux-block-owner@xxxxxxxxxxxxxxx [mailto:linux-block- > owner@xxxxxxxxxxxxxxx] On Behalf Of Hannes Reinecke > Sent: Thursday, November 10, 2016 10:05 AM > To: Jens Axboe <axboe@xxxxxxxxx>; Christoph Hellwig <hch@xxxxxx> > Cc: SCSI Mailing List <linux-scsi@xxxxxxxxxxxxxxx>; linux- > block@xxxxxxxxxxxxxxx > Subject: Reduced latency is killing performance > > Hi all, > > this really feels like a follow-up to the discussion we've had in > Santa Fe, but finally I'm able to substantiate it with some numbers. > > I've made a patch to enable the megaraid_sas driver for multiqueue. > While this is pretty straightforward (I'll be sending the patchset > later on), the results are ... interesting. > > I've run the 'ssd-test.fio' script from Jens' repository, and these > results for MQ/SQ (- is mq, + is sq): > > Run status group 0 (all jobs): [4 KiB sequential reads] > - READ: io=10641MB, aggrb=181503KB/s > + READ: io=18370MB, aggrb=312572KB/s > > Run status group 1 (all jobs): [4 KiB random reads] > - READ: io=441444KB, aggrb=7303KB/s > + READ: io=223108KB, aggrb=3707KB/s > > Run status group 2 (all jobs): [4 KiB sequential writes] > - WRITE: io=22485MB, aggrb=383729KB/s > + WRITE: io=47421MB, aggrb=807581KB/s > > Run status group 3 (all jobs): [4 KiB random writes] > - WRITE: io=489852KB, aggrb=8110KB/s > + WRITE: io=489748KB, aggrb=8134KB/s > > Disk stats (read/write): > - sda: ios=2834412/5878578, merge=0/0 > + sda: ios=205278/2680329, merge=4552593/9580622 [deleted minb, maxb, mint, maxt, ticks, in_queue, and util above] > > As you can see, we're really losing performance in the multiqueue > case. > And the main reason for that is that we submit about _10 times_ as > much I/O as we do for the single-queue case. That script is running: 0) 4 KiB sequential reads 1) 4 KiB random reads 2) 4 KiB sequential writes 3) 4 KiB random writes I think you're just seeing a lack of merges for the tiny sequential workloads. Those are the ones where mq has lower aggrb results. Check the value in /sys/block/sda/queue/nomerges. The values are 0=search for fast and slower merges 1=only attempt fast merges 2=don't attempt any merges The SNIA Enterprise Solid State Storage Performance Test Specification (SSS PTS-E) only measures 128 KiB and 1 MiB sequential IOs - it doesn't test tiny sequential IOs. Applications may do anything, but I think most understand that fewer, bigger transfers are more efficient throughout the IO stack. A blocksize of 128 KiB would reduce those IOs by 96%. For hpsa, we often turned them off to avoid the overhead while running applications generating decent-sized IOs on their own. Note that the random read aggrb value doubled with mq, and random writes showed no impact. You might also want to set cpus_allowed_policy=split to keep threads from wandering across CPUs (and thus changing queues). > So I guess having an I/O scheduler is critical, even for the scsi-mq > case. blk-mq still supports merges without any scheduler. --- Robert Elliott, HPE Persistent Memory ��.n��������+%������w��{.n�����{����n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�