Tobias, I’d try 128 jobs, QD 32 and disable random map and latency measurements randrepeat=0 norandommap disable_ lat -- Andrey Kudryavtsev, SSD Solution Architect Intel Corp. inet: 83564353 work: +1-916-356-4353 mobile: +1-916-221-2281 On 1/23/17, 10:33 AM, "fio-owner@xxxxxxxxxxxxxxx on behalf of Tobias Oberstein" <fio-owner@xxxxxxxxxxxxxxx on behalf of tobias.oberstein@xxxxxxxxx> wrote: > You're just running a huge number of threads against the same md device and > bottleneck on some internal lock. If you step back and set up, say, 256 Ah, alright. Shit. > threads with ioengine=libaio, qd=128 (to match the in-flight I/O number), > you'd likely see the locking impact reduced substantially. The problem with using libaio and QD>1 is: that doesn't represent the workload I am optimizing for. The workload is PostgreSQL, and that is doing all it's IO as regular read/writes, and hence the use of ioengine=sync with large thread counts. Note: we have an internal tool that is able to parallelize PostgreSQL via database sessions. -- I tried anyway. Here is what I get with engine=libaio (results down below): A) QD=128 and jobs=8 (same effective IO concurrency as previously = 1024) iops=200184 The IOPS stay constant during the run (120s). B) QD=128 and jobs=16 (effective concurrency = 2048) iops=1068.7K But, but: The IOPS slowly go up to over 5 mio, then collapses to like 20k, and then go up again. Very strange. C) QD=128 and jobs=32 (effective concurrency = 4096) FIO claims: iops=2135.9K Which is still 3.5x lower than what I get with the sync engine and 2800 threads! Plus: that strange behavior over run time .. IOPS go up to 10M: http://picpaste.com/pics/Bildschirmfoto_vom_2017-01-23_19-29-13-ZEyCVcKZ.1485196199.png and the collapse to 0 IOPS http://picpaste.com/pics/Bildschirmfoto_vom_2017-01-23_19-30-20-GEEEQR6f.1485196243.png at which the NVMes don't show any load (I am watching them in another window). === libaio is nowhere near what I get with engine=sync and high job counts. Mmh. Plus the strange behavior. And as said, that doesn't represent my workload anyways. I want to stay away from AIO .. Cheers, /Tobias A) oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$ sudo fio postgresql_storage_workload.fio randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128 ... fio-2.1.11 Starting 8 threads Jobs: 1 (f=1): [_(2),r(1),_(5)] [38.3% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 03m:23s] randread: (groupid=0, jobs=8): err= 0: pid=1994: Mon Jan 23 19:23:23 2017 read : io=93837MB, bw=800739KB/s, iops=200184, runt=120001msec slat (usec): min=0, max=4291, avg=39.28, stdev=76.95 clat (usec): min=2, max=22205, avg=5075.21, stdev=3646.18 lat (usec): min=5, max=22333, avg=5114.55, stdev=3674.10 clat percentiles (usec): | 1.00th=[ 916], 5.00th=[ 1224], 10.00th=[ 1448], 20.00th=[ 1864], | 30.00th=[ 2320], 40.00th=[ 2960], 50.00th=[ 3920], 60.00th=[ 5024], | 70.00th=[ 6368], 80.00th=[ 8384], 90.00th=[10944], 95.00th=[12608], | 99.00th=[14272], 99.50th=[15168], 99.90th=[16768], 99.95th=[17536], | 99.99th=[18816] bw (KB /s): min=33088, max=400688, per=12.35%, avg=98898.47, stdev=76253.23 lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01% lat (usec) : 250=0.01%, 500=0.01%, 750=0.22%, 1000=1.48% lat (msec) : 2=21.67%, 4=27.51%, 10=35.37%, 20=13.74%, 50=0.01% cpu : usr=1.53%, sys=13.53%, ctx=7504182, majf=0, minf=1032 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=24022368/w=0/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: io=93837MB, aggrb=800738KB/s, minb=800738KB/s, maxb=800738KB/s, mint=120001msec, maxt=120001msec Disk stats (read/write): md1: ios=7485313/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=468407/0, aggrmerge=0/0, aggrticks=51834/0, aggrin_queue=51770, aggrutil=35.00% nvme15n1: ios=468133/0, merge=0/0, ticks=52628/0, in_queue=52532, util=34.39% nvme6n1: ios=468355/0, merge=0/0, ticks=48944/0, in_queue=48840, util=32.34% nvme9n1: ios=468561/0, merge=0/0, ticks=53924/0, in_queue=53956, util=35.00% nvme11n1: ios=468354/0, merge=0/0, ticks=53424/0, in_queue=53396, util=34.70% nvme2n1: ios=468418/0, merge=0/0, ticks=51536/0, in_queue=51496, util=33.63% nvme14n1: ios=468669/0, merge=0/0, ticks=51696/0, in_queue=51576, util=33.84% nvme5n1: ios=468526/0, merge=0/0, ticks=50004/0, in_queue=49928, util=33.00% nvme8n1: ios=468233/0, merge=0/0, ticks=52232/0, in_queue=52140, util=33.82% nvme10n1: ios=468501/0, merge=0/0, ticks=52532/0, in_queue=52416, util=34.29% nvme1n1: ios=468434/0, merge=0/0, ticks=53492/0, in_queue=53404, util=34.58% nvme13n1: ios=468544/0, merge=0/0, ticks=51876/0, in_queue=51860, util=33.85% nvme4n1: ios=468513/0, merge=0/0, ticks=51172/0, in_queue=51176, util=33.30% nvme7n1: ios=468245/0, merge=0/0, ticks=50564/0, in_queue=50484, util=33.14% nvme0n1: ios=468318/0, merge=0/0, ticks=49812/0, in_queue=49760, util=32.67% nvme12n1: ios=468279/0, merge=0/0, ticks=52416/0, in_queue=52344, util=34.17% nvme3n1: ios=468442/0, merge=0/0, ticks=53092/0, in_queue=53016, util=34.37% oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$ B) oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$ sudo fio postgresql_storage_workload.fio randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128 ... fio-2.1.11 Starting 16 threads Jobs: 1 (f=1): [_(15),r(1)] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s] randread: (groupid=0, jobs=16): err= 0: pid=2141: Mon Jan 23 19:27:38 2017 read : io=500942MB, bw=4174.5MB/s, iops=1068.7K, runt=120001msec slat (usec): min=0, max=3647, avg=11.07, stdev=37.60 clat (usec): min=2, max=19872, avg=1475.65, stdev=2510.83 lat (usec): min=4, max=19964, avg=1486.76, stdev=2530.31 clat percentiles (usec): | 1.00th=[ 334], 5.00th=[ 346], 10.00th=[ 358], 20.00th=[ 362], | 30.00th=[ 370], 40.00th=[ 378], 50.00th=[ 398], 60.00th=[ 494], | 70.00th=[ 780], 80.00th=[ 1480], 90.00th=[ 4256], 95.00th=[ 8032], | 99.00th=[12096], 99.50th=[12736], 99.90th=[14272], 99.95th=[14912], | 99.99th=[16512] bw (KB /s): min= 0, max=1512848, per=8.04%, avg=343481.50, stdev=460791.59 lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01% lat (usec) : 250=0.01%, 500=60.27%, 750=8.95%, 1000=4.94% lat (msec) : 2=9.33%, 4=5.98%, 10=7.89%, 20=2.63% cpu : usr=3.19%, sys=44.95%, ctx=9452424, majf=0, minf=2064 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=128241193/w=0/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: io=500942MB, aggrb=4174.5MB/s, minb=4174.5MB/s, maxb=4174.5MB/s, mint=120001msec, maxt=120001msec Disk stats (read/write): md1: ios=9392258/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=588533/0, aggrmerge=0/0, aggrticks=63464/0, aggrin_queue=63476, aggrutil=36.40% nvme15n1: ios=588661/0, merge=0/0, ticks=66932/0, in_queue=66824, util=36.40% nvme6n1: ios=589278/0, merge=0/0, ticks=60768/0, in_queue=60600, util=34.84% nvme9n1: ios=588744/0, merge=0/0, ticks=64344/0, in_queue=64480, util=35.85% nvme11n1: ios=588005/0, merge=0/0, ticks=65636/0, in_queue=65828, util=36.02% nvme2n1: ios=588097/0, merge=0/0, ticks=62296/0, in_queue=62440, util=35.00% nvme14n1: ios=588451/0, merge=0/0, ticks=64480/0, in_queue=64408, util=35.87% nvme5n1: ios=588654/0, merge=0/0, ticks=60736/0, in_queue=60704, util=34.66% nvme8n1: ios=588843/0, merge=0/0, ticks=63980/0, in_queue=63928, util=35.40% nvme10n1: ios=588315/0, merge=0/0, ticks=62436/0, in_queue=62432, util=35.15% nvme1n1: ios=588327/0, merge=0/0, ticks=64432/0, in_queue=64564, util=36.10% nvme13n1: ios=588342/0, merge=0/0, ticks=65856/0, in_queue=65892, util=36.06% nvme4n1: ios=588343/0, merge=0/0, ticks=64528/0, in_queue=64752, util=35.73% nvme7n1: ios=589243/0, merge=0/0, ticks=63740/0, in_queue=63696, util=35.34% nvme0n1: ios=588499/0, merge=0/0, ticks=61308/0, in_queue=61268, util=34.83% nvme12n1: ios=588221/0, merge=0/0, ticks=62076/0, in_queue=61976, util=35.19% nvme3n1: ios=588512/0, merge=0/0, ticks=61880/0, in_queue=61824, util=35.09% oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$ C) oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$ sudo fio postgresql_storage_workload.fio randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128 ... fio-2.1.11 Starting 32 threads Jobs: 1 (f=0): [_(24),r(1),_(7)] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s] randread: (groupid=0, jobs=32): err= 0: pid=2263: Mon Jan 23 19:30:49 2017 read : io=977.76GB, bw=8343.4MB/s, iops=2135.9K, runt=120001msec slat (usec): min=0, max=3372, avg= 7.30, stdev=27.48 clat (usec): min=1, max=21871, avg=997.26, stdev=1995.10 lat (usec): min=4, max=21982, avg=1004.60, stdev=2010.61 clat percentiles (usec): | 1.00th=[ 374], 5.00th=[ 378], 10.00th=[ 378], 20.00th=[ 386], | 30.00th=[ 390], 40.00th=[ 394], 50.00th=[ 394], 60.00th=[ 398], | 70.00th=[ 406], 80.00th=[ 540], 90.00th=[ 1496], 95.00th=[ 5408], | 99.00th=[10944], 99.50th=[12224], 99.90th=[14016], 99.95th=[14784], | 99.99th=[16512] bw (KB /s): min= 0, max=1353208, per=5.91%, avg=505187.96, stdev=549388.79 lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% lat (usec) : 100=0.01%, 250=0.01%, 500=78.69%, 750=5.80%, 1000=2.94% lat (msec) : 2=3.84%, 4=2.66%, 10=4.52%, 20=1.56%, 50=0.01% cpu : usr=3.09%, sys=68.19%, ctx=10916103, majf=0, minf=4128 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=256309234/w=0/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: io=977.76GB, aggrb=8343.4MB/s, minb=8343.4MB/s, maxb=8343.4MB/s, mint=120001msec, maxt=120001msec Disk stats (read/write): md1: ios=10762806/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=675866/0, aggrmerge=0/0, aggrticks=70332/0, aggrin_queue=70505, aggrutil=28.65% nvme15n1: ios=675832/0, merge=0/0, ticks=69604/0, in_queue=69648, util=27.82% nvme6n1: ios=676181/0, merge=0/0, ticks=75584/0, in_queue=75552, util=28.65% nvme9n1: ios=675762/0, merge=0/0, ticks=67916/0, in_queue=68236, util=27.79% nvme11n1: ios=675745/0, merge=0/0, ticks=68296/0, in_queue=68804, util=27.66% nvme2n1: ios=676036/0, merge=0/0, ticks=70904/0, in_queue=71240, util=28.14% nvme14n1: ios=675737/0, merge=0/0, ticks=71560/0, in_queue=71716, util=28.13% nvme5n1: ios=676592/0, merge=0/0, ticks=71832/0, in_queue=71976, util=28.02% nvme8n1: ios=675969/0, merge=0/0, ticks=69152/0, in_queue=69192, util=27.63% nvme10n1: ios=675607/0, merge=0/0, ticks=67600/0, in_queue=67668, util=27.74% nvme1n1: ios=675528/0, merge=0/0, ticks=72856/0, in_queue=73136, util=28.48% nvme13n1: ios=675189/0, merge=0/0, ticks=69736/0, in_queue=70084, util=28.04% nvme4n1: ios=676117/0, merge=0/0, ticks=68120/0, in_queue=68600, util=27.88% nvme7n1: ios=675726/0, merge=0/0, ticks=72004/0, in_queue=71960, util=28.25% nvme0n1: ios=676119/0, merge=0/0, ticks=71228/0, in_queue=71264, util=28.12% nvme12n1: ios=675837/0, merge=0/0, ticks=70320/0, in_queue=70368, util=27.99% nvme3n1: ios=675887/0, merge=0/0, ticks=68600/0, in_queue=68636, util=27.95% oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$ -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html ��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�