Hi all, this really feels like a follow-up to the discussion we've had in Santa Fe, but finally I'm able to substantiate it with some numbers. I've made a patch to enable the megaraid_sas driver for multiqueue. While this is pretty straightforward (I'll be sending the patchset later on), the results are ... interesting. I've run the 'ssd-test.fio' script from Jens' repository, and these results for MQ/SQ (- is mq, + is sq): Run status group 0 (all jobs): - READ: io=10641MB, aggrb=181503KB/s, minb=181503KB/s, maxb=181503KB/s, mint=60033msec, maxt=60033msec + READ: io=18370MB, aggrb=312572KB/s, minb=312572KB/s, maxb=312572KB/s, mint=60181msec, maxt=60181msec Run status group 1 (all jobs): - READ: io=441444KB, aggrb=7303KB/s, minb=7303KB/s, maxb=7303KB/s, mint=60443msec, maxt=60443msec + READ: io=223108KB, aggrb=3707KB/s, minb=3707KB/s, maxb=3707KB/s, mint=60182msec, maxt=60182msec Run status group 2 (all jobs): - WRITE: io=22485MB, aggrb=383729KB/s, minb=383729KB/s, maxb=383729KB/s, mint=60001msec, maxt=60001msec + WRITE: io=47421MB, aggrb=807581KB/s, minb=807581KB/s, maxb=807581KB/s, mint=60129msec, maxt=60129msec Run status group 3 (all jobs): - WRITE: io=489852KB, aggrb=8110KB/s, minb=8110KB/s, maxb=8110KB/s, mint=60399msec, maxt=60399msec + WRITE: io=489748KB, aggrb=8134KB/s, minb=8134KB/s, maxb=8134KB/s, mint=60207msec, maxt=60207msec Disk stats (read/write): - sda: ios=2834412/5878578, merge=0/0, ticks=86269292/48364836, in_queue=135345876, util=99.20% + sda: ios=205278/2680329, merge=4552593/9580622, ticks=12539912/12965228, in_queue=25512312, util=99.59% As you can see, we're really losing performance in the multiqueue case. And the main reason for that is that we submit about _10 times_ as much I/O as we do for the single-queue case. So I guess having an I/O scheduler is critical, even for the scsi-mq case. Full fio output attached. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)
# Do some important numbers on SSD drives, to gauge what kind of # performance you might get out of them. # # Sequential read and write speeds are tested, these are expected to be # high. Random reads should also be fast, random writes are where crap # drives are usually separated from the good drives. # # This uses a queue depth of 4. New SATA SSD's will support up to 32 # in flight commands, so it may also be interesting to increase the queue # depth and compare. Note that most real-life usage will not see that # large of a queue depth, so 4 is more representative of normal use. # [global] ioengine=libaio iodepth=256 size=10g direct=1 runtime=60 numjobs=16 directory=/home filename=ssd.test.file group_reporting [seq-read] rw=read stonewall [rand-read] rw=randread stonewall [seq-write] rw=write stonewall [rand-write] rw=randwrite stonewall
seq-read: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=256 ... rand-read: (g=1): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=256 ... seq-write: (g=2): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=256 ... rand-write: (g=3): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=256 ... fio-2.15 Starting 64 processes seq-read: (groupid=0, jobs=16): err= 0: pid=3715: Thu Nov 10 16:48:49 2016 read : io=18370MB, bw=312573KB/s, iops=78143, runt= 60181msec slat (usec): min=1, max=5664, avg= 2.33, stdev= 5.74 clat (usec): min=30, max=1283.3K, avg=52320.04, stdev=114862.40 lat (usec): min=37, max=1283.3K, avg=52322.47, stdev=114862.35 clat percentiles (usec): | 1.00th=[ 620], 5.00th=[ 1432], 10.00th=[ 2096], 20.00th=[ 2544], | 30.00th=[ 2608], 40.00th=[ 2640], 50.00th=[ 2672], 60.00th=[ 2736], | 70.00th=[ 2864], 80.00th=[14912], 90.00th=[284672], 95.00th=[342016], | 99.00th=[419840], 99.50th=[468992], 99.90th=[585728], 99.95th=[634880], | 99.99th=[749568] lat (usec) : 50=0.01%, 100=0.01%, 250=0.20%, 500=0.47%, 750=1.07% lat (usec) : 1000=0.90% lat (msec) : 2=5.92%, 4=63.81%, 10=5.22%, 20=3.28%, 50=1.84% lat (msec) : 100=1.19%, 250=4.25%, 500=11.54%, 750=0.28%, 1000=0.01% lat (msec) : 2000=0.01% cpu : usr=0.65%, sys=1.42%, ctx=78762, majf=0, minf=9372 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=4702738/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=256 rand-read: (groupid=1, jobs=16): err= 0: pid=3772: Thu Nov 10 16:48:49 2016 read : io=223108KB, bw=3707.3KB/s, iops=926, runt= 60182msec slat (usec): min=2, max=1131.3K, avg=17217.00, stdev=95963.21 clat (msec): min=16, max=5492, avg=4255.60, stdev=671.40 lat (msec): min=171, max=6020, avg=4272.81, stdev=675.11 clat percentiles (msec): | 1.00th=[ 717], 5.00th=[ 3195], 10.00th=[ 4080], 20.00th=[ 4228], | 30.00th=[ 4359], 40.00th=[ 4359], 50.00th=[ 4424], 60.00th=[ 4424], | 70.00th=[ 4490], 80.00th=[ 4490], 90.00th=[ 4555], 95.00th=[ 4752], | 99.00th=[ 4817], 99.50th=[ 4948], 99.90th=[ 5473], 99.95th=[ 5473], | 99.99th=[ 5473] lat (msec) : 20=0.01%, 100=0.02%, 250=0.24%, 500=0.40%, 750=0.40% lat (msec) : 1000=0.46%, 2000=1.61%, >=2000=96.87% cpu : usr=0.02%, sys=0.04%, ctx=1929, majf=0, minf=4259 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=0.5%, 32=0.9%, >=64=98.2% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=55777/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=256 seq-write: (groupid=2, jobs=16): err= 0: pid=3790: Thu Nov 10 16:48:49 2016 write: io=47421MB, bw=807582KB/s, iops=201895, runt= 60129msec slat (usec): min=2, max=3039, avg= 4.85, stdev=16.02 clat (usec): min=95, max=794322, avg=20251.85, stdev=67558.84 lat (usec): min=123, max=794326, avg=20256.83, stdev=67559.50 clat percentiles (usec): | 1.00th=[ 900], 5.00th=[ 932], 10.00th=[ 956], 20.00th=[ 996], | 30.00th=[ 1048], 40.00th=[ 1112], 50.00th=[ 1160], 60.00th=[ 1224], | 70.00th=[ 1400], 80.00th=[ 1720], 90.00th=[ 2960], 95.00th=[232448], | 99.00th=[296960], 99.50th=[313344], 99.90th=[366592], 99.95th=[489472], | 99.99th=[774144] lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.09%, 1000=20.47% lat (msec) : 2=64.34%, 4=5.85%, 10=0.24%, 20=0.33%, 50=0.81% lat (msec) : 100=0.54%, 250=3.00%, 500=4.26%, 750=0.02%, 1000=0.03% cpu : usr=1.41%, sys=5.44%, ctx=558403, majf=0, minf=12550 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=0/w=12139774/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=256 rand-write: (groupid=3, jobs=16): err= 0: pid=3806: Thu Nov 10 16:48:49 2016 write: io=489748KB, bw=8134.5KB/s, iops=2033, runt= 60207msec slat (usec): min=3, max=330352, avg=7843.31, stdev=43990.65 clat (msec): min=65, max=3671, avg=2001.23, stdev=448.67 lat (msec): min=68, max=3671, avg=2009.07, stdev=449.18 clat percentiles (msec): | 1.00th=[ 285], 5.00th=[ 693], 10.00th=[ 1680], 20.00th=[ 1991], | 30.00th=[ 2040], 40.00th=[ 2089], 50.00th=[ 2114], 60.00th=[ 2147], | 70.00th=[ 2180], 80.00th=[ 2245], 90.00th=[ 2343], 95.00th=[ 2376], | 99.00th=[ 2540], 99.50th=[ 2606], 99.90th=[ 2704], 99.95th=[ 2737], | 99.99th=[ 2868] lat (msec) : 100=0.08%, 250=0.68%, 500=1.72%, 750=3.60%, 1000=1.28% lat (msec) : 2000=14.04%, >=2000=78.60% cpu : usr=0.03%, sys=0.14%, ctx=11009, majf=0, minf=172 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.2% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=0/w=122437/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=256 Run status group 0 (all jobs): READ: io=18370MB, aggrb=312572KB/s, minb=312572KB/s, maxb=312572KB/s, mint=60181msec, maxt=60181msec Run status group 1 (all jobs): READ: io=223108KB, aggrb=3707KB/s, minb=3707KB/s, maxb=3707KB/s, mint=60182msec, maxt=60182msec Run status group 2 (all jobs): WRITE: io=47421MB, aggrb=807581KB/s, minb=807581KB/s, maxb=807581KB/s, mint=60129msec, maxt=60129msec Run status group 3 (all jobs): WRITE: io=489748KB, aggrb=8134KB/s, minb=8134KB/s, maxb=8134KB/s, mint=60207msec, maxt=60207msec Disk stats (read/write): sda: ios=205278/2680329, merge=4552593/9580622, ticks=12539912/12965228, in_queue=25512312, util=99.59%
seq-read: (g=0): rw=read, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=256 ... rand-read: (g=1): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=256 ... seq-write: (g=2): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=256 ... rand-write: (g=3): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=256 ... fio-2.15 Starting 64 processes seq-read: (groupid=0, jobs=16): err= 0: pid=12365: Thu Nov 10 16:34:03 2016 read : io=10641MB, bw=181503KB/s, iops=45375, runt= 60033msec slat (usec): min=2, max=168902, avg=349.71, stdev=2179.65 clat (usec): min=27, max=1046.3K, avg=89851.03, stdev=94205.19 lat (usec): min=921, max=1046.3K, avg=90200.97, stdev=94519.88 clat percentiles (msec): | 1.00th=[ 6], 5.00th=[ 11], 10.00th=[ 16], 20.00th=[ 23], | 30.00th=[ 32], 40.00th=[ 43], 50.00th=[ 57], 60.00th=[ 77], | 70.00th=[ 101], 80.00th=[ 139], 90.00th=[ 210], 95.00th=[ 289], | 99.00th=[ 445], 99.50th=[ 506], 99.90th=[ 644], 99.95th=[ 709], | 99.99th=[ 881] lat (usec) : 50=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.22%, 10=3.90%, 20=12.47%, 50=29.05% lat (msec) : 100=23.95%, 250=23.49%, 500=6.38%, 750=0.51%, 1000=0.03% lat (msec) : 2000=0.01% cpu : usr=0.38%, sys=1.53%, ctx=1261134, majf=0, minf=7465 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=2724047/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=256 rand-read: (groupid=1, jobs=16): err= 0: pid=12424: Thu Nov 10 16:34:03 2016 read : io=441444KB, bw=7303.5KB/s, iops=1825, runt= 60443msec slat (usec): min=3, max=5654.6K, avg=8705.16, stdev=98890.89 clat (usec): min=499, max=54071K, avg=2020668.34, stdev=4066596.80 lat (usec): min=534, max=54071K, avg=2029373.76, stdev=4080684.63 clat percentiles (msec): | 1.00th=[ 58], 5.00th=[ 96], 10.00th=[ 120], 20.00th=[ 163], | 30.00th=[ 219], 40.00th=[ 310], 50.00th=[ 482], 60.00th=[ 881], | 70.00th=[ 1598], 80.00th=[ 2868], 90.00th=[ 5604], 95.00th=[ 8455], | 99.00th=[16712], 99.50th=[16712], 99.90th=[16712], 99.95th=[16712], | 99.99th=[16712] lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.03%, 20=0.08%, 50=0.47% lat (msec) : 100=5.07%, 250=28.49%, 500=16.49%, 750=6.91%, 1000=4.65% lat (msec) : 2000=11.35%, >=2000=26.44% cpu : usr=0.03%, sys=0.10%, ctx=12943, majf=0, minf=4261 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.5%, >=64=99.1% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=110361/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=256 seq-write: (groupid=2, jobs=16): err= 0: pid=12440: Thu Nov 10 16:34:03 2016 write: io=22485MB, bw=383729KB/s, iops=95932, runt= 60001msec slat (usec): min=4, max=25089, avg=160.99, stdev=186.82 clat (usec): min=9, max=99208, avg=42511.29, stdev=8094.51 lat (usec): min=285, max=99223, avg=42672.98, stdev=8120.87 clat percentiles (usec): | 1.00th=[27008], 5.00th=[35072], 10.00th=[35584], 20.00th=[37120], | 30.00th=[37632], 40.00th=[38656], 50.00th=[40192], 60.00th=[41728], | 70.00th=[43776], 80.00th=[47872], 90.00th=[54528], 95.00th=[60160], | 99.00th=[67072], 99.50th=[70144], 99.90th=[76288], 99.95th=[78336], | 99.99th=[84480] lat (usec) : 10=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.33%, 50=83.21% lat (msec) : 100=16.42% cpu : usr=3.17%, sys=19.97%, ctx=10291638, majf=0, minf=28791 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=0/w=5756037/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=256 rand-write: (groupid=3, jobs=16): err= 0: pid=12456: Thu Nov 10 16:34:03 2016 write: io=489852KB, bw=8110.3KB/s, iops=2027, runt= 60399msec slat (usec): min=4, max=7321.2K, avg=7849.80, stdev=103311.36 clat (msec): min=30, max=49117, avg=2007.24, stdev=3917.47 lat (msec): min=30, max=49143, avg=2015.09, stdev=3929.12 clat percentiles (msec): | 1.00th=[ 73], 5.00th=[ 120], 10.00th=[ 147], 20.00th=[ 194], | 30.00th=[ 253], 40.00th=[ 322], 50.00th=[ 469], 60.00th=[ 766], | 70.00th=[ 1352], 80.00th=[ 2704], 90.00th=[ 5473], 95.00th=[ 9372], | 99.00th=[16712], 99.50th=[16712], 99.90th=[16712], 99.95th=[16712], | 99.99th=[16712] lat (msec) : 50=0.18%, 100=2.31%, 250=27.05%, 500=22.01%, 750=8.03% lat (msec) : 1000=4.76%, 2000=11.62%, >=2000=24.04% cpu : usr=0.05%, sys=0.16%, ctx=23792, majf=0, minf=181 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.2% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=0/w=122463/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=256 Run status group 0 (all jobs): READ: io=10641MB, aggrb=181503KB/s, minb=181503KB/s, maxb=181503KB/s, mint=60033msec, maxt=60033msec Run status group 1 (all jobs): READ: io=441444KB, aggrb=7303KB/s, minb=7303KB/s, maxb=7303KB/s, mint=60443msec, maxt=60443msec Run status group 2 (all jobs): WRITE: io=22485MB, aggrb=383729KB/s, minb=383729KB/s, maxb=383729KB/s, mint=60001msec, maxt=60001msec Run status group 3 (all jobs): WRITE: io=489852KB, aggrb=8110KB/s, minb=8110KB/s, maxb=8110KB/s, mint=60399msec, maxt=60399msec Disk stats (read/write): sda: ios=2834412/5878578, merge=0/0, ticks=86269292/48364836, in_queue=135345876, util=99.20%