Re: 4x lower IOPS: Linux MD vs indiv. devices - why?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You're just running a huge number of threads against the same md device and
bottleneck on some internal lock. If you step back and set up, say, 256

Ah, alright. Shit.

threads with ioengine=libaio, qd=128 (to match the in-flight I/O number),
you'd likely see the locking impact reduced substantially.

The problem with using libaio and QD>1 is: that doesn't represent the workload I am optimizing for.

The workload is PostgreSQL, and that is doing all it's IO as regular read/writes, and hence the use of ioengine=sync with large thread counts.

Note: we have an internal tool that is able to parallelize PostgreSQL via database sessions.

--

I tried anyway. Here is what I get with engine=libaio (results down below):

A)
QD=128 and jobs=8 (same effective IO concurrency as previously = 1024)

iops=200184

The IOPS stay constant during the run (120s).

B)
QD=128 and jobs=16 (effective concurrency = 2048)

iops=1068.7K

But, but:

The IOPS slowly go up to over 5 mio, then collapses to like 20k, and then go up again. Very strange.

C)
QD=128 and jobs=32 (effective concurrency = 4096)

FIO claims: iops=2135.9K

Which is still 3.5x lower than what I get with the sync engine and 2800 threads!

Plus: that strange behavior over run time .. IOPS go up to 10M:

http://picpaste.com/pics/Bildschirmfoto_vom_2017-01-23_19-29-13-ZEyCVcKZ.1485196199.png

and the collapse to 0 IOPS

http://picpaste.com/pics/Bildschirmfoto_vom_2017-01-23_19-30-20-GEEEQR6f.1485196243.png

at which the NVMes don't show any load (I am watching them in another window).

===

libaio is nowhere near what I get with engine=sync and high job counts. Mmh. Plus the strange behavior.

And as said, that doesn't represent my workload anyways.

I want to stay away from AIO ..

Cheers,
/Tobias


A)

oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$ sudo fio postgresql_storage_workload.fio randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
...
fio-2.1.11
Starting 8 threads
Jobs: 1 (f=1): [_(2),r(1),_(5)] [38.3% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 03m:23s]
randread: (groupid=0, jobs=8): err= 0: pid=1994: Mon Jan 23 19:23:23 2017
  read : io=93837MB, bw=800739KB/s, iops=200184, runt=120001msec
    slat (usec): min=0, max=4291, avg=39.28, stdev=76.95
    clat (usec): min=2, max=22205, avg=5075.21, stdev=3646.18
     lat (usec): min=5, max=22333, avg=5114.55, stdev=3674.10
    clat percentiles (usec):
     |  1.00th=[  916],  5.00th=[ 1224], 10.00th=[ 1448], 20.00th=[ 1864],
     | 30.00th=[ 2320], 40.00th=[ 2960], 50.00th=[ 3920], 60.00th=[ 5024],
     | 70.00th=[ 6368], 80.00th=[ 8384], 90.00th=[10944], 95.00th=[12608],
     | 99.00th=[14272], 99.50th=[15168], 99.90th=[16768], 99.95th=[17536],
     | 99.99th=[18816]
bw (KB /s): min=33088, max=400688, per=12.35%, avg=98898.47, stdev=76253.23
    lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=0.01%, 500=0.01%, 750=0.22%, 1000=1.48%
    lat (msec) : 2=21.67%, 4=27.51%, 10=35.37%, 20=13.74%, 50=0.01%
  cpu          : usr=1.53%, sys=13.53%, ctx=7504182, majf=0, minf=1032
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=24022368/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
READ: io=93837MB, aggrb=800738KB/s, minb=800738KB/s, maxb=800738KB/s, mint=120001msec, maxt=120001msec

Disk stats (read/write):
md1: ios=7485313/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=468407/0, aggrmerge=0/0, aggrticks=51834/0, aggrin_queue=51770, aggrutil=35.00% nvme15n1: ios=468133/0, merge=0/0, ticks=52628/0, in_queue=52532, util=34.39% nvme6n1: ios=468355/0, merge=0/0, ticks=48944/0, in_queue=48840, util=32.34% nvme9n1: ios=468561/0, merge=0/0, ticks=53924/0, in_queue=53956, util=35.00% nvme11n1: ios=468354/0, merge=0/0, ticks=53424/0, in_queue=53396, util=34.70% nvme2n1: ios=468418/0, merge=0/0, ticks=51536/0, in_queue=51496, util=33.63% nvme14n1: ios=468669/0, merge=0/0, ticks=51696/0, in_queue=51576, util=33.84% nvme5n1: ios=468526/0, merge=0/0, ticks=50004/0, in_queue=49928, util=33.00% nvme8n1: ios=468233/0, merge=0/0, ticks=52232/0, in_queue=52140, util=33.82% nvme10n1: ios=468501/0, merge=0/0, ticks=52532/0, in_queue=52416, util=34.29% nvme1n1: ios=468434/0, merge=0/0, ticks=53492/0, in_queue=53404, util=34.58% nvme13n1: ios=468544/0, merge=0/0, ticks=51876/0, in_queue=51860, util=33.85% nvme4n1: ios=468513/0, merge=0/0, ticks=51172/0, in_queue=51176, util=33.30% nvme7n1: ios=468245/0, merge=0/0, ticks=50564/0, in_queue=50484, util=33.14% nvme0n1: ios=468318/0, merge=0/0, ticks=49812/0, in_queue=49760, util=32.67% nvme12n1: ios=468279/0, merge=0/0, ticks=52416/0, in_queue=52344, util=34.17% nvme3n1: ios=468442/0, merge=0/0, ticks=53092/0, in_queue=53016, util=34.37%
oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$


B)

oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$ sudo fio postgresql_storage_workload.fio randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
...
fio-2.1.11
Starting 16 threads
Jobs: 1 (f=1): [_(15),r(1)] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
randread: (groupid=0, jobs=16): err= 0: pid=2141: Mon Jan 23 19:27:38 2017
  read : io=500942MB, bw=4174.5MB/s, iops=1068.7K, runt=120001msec
    slat (usec): min=0, max=3647, avg=11.07, stdev=37.60
    clat (usec): min=2, max=19872, avg=1475.65, stdev=2510.83
     lat (usec): min=4, max=19964, avg=1486.76, stdev=2530.31
    clat percentiles (usec):
     |  1.00th=[  334],  5.00th=[  346], 10.00th=[  358], 20.00th=[  362],
     | 30.00th=[  370], 40.00th=[  378], 50.00th=[  398], 60.00th=[  494],
     | 70.00th=[  780], 80.00th=[ 1480], 90.00th=[ 4256], 95.00th=[ 8032],
     | 99.00th=[12096], 99.50th=[12736], 99.90th=[14272], 99.95th=[14912],
     | 99.99th=[16512]
bw (KB /s): min= 0, max=1512848, per=8.04%, avg=343481.50, stdev=460791.59
    lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
    lat (usec) : 250=0.01%, 500=60.27%, 750=8.95%, 1000=4.94%
    lat (msec) : 2=9.33%, 4=5.98%, 10=7.89%, 20=2.63%
  cpu          : usr=3.19%, sys=44.95%, ctx=9452424, majf=0, minf=2064
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=128241193/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
READ: io=500942MB, aggrb=4174.5MB/s, minb=4174.5MB/s, maxb=4174.5MB/s, mint=120001msec, maxt=120001msec

Disk stats (read/write):
md1: ios=9392258/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=588533/0, aggrmerge=0/0, aggrticks=63464/0, aggrin_queue=63476, aggrutil=36.40% nvme15n1: ios=588661/0, merge=0/0, ticks=66932/0, in_queue=66824, util=36.40% nvme6n1: ios=589278/0, merge=0/0, ticks=60768/0, in_queue=60600, util=34.84% nvme9n1: ios=588744/0, merge=0/0, ticks=64344/0, in_queue=64480, util=35.85% nvme11n1: ios=588005/0, merge=0/0, ticks=65636/0, in_queue=65828, util=36.02% nvme2n1: ios=588097/0, merge=0/0, ticks=62296/0, in_queue=62440, util=35.00% nvme14n1: ios=588451/0, merge=0/0, ticks=64480/0, in_queue=64408, util=35.87% nvme5n1: ios=588654/0, merge=0/0, ticks=60736/0, in_queue=60704, util=34.66% nvme8n1: ios=588843/0, merge=0/0, ticks=63980/0, in_queue=63928, util=35.40% nvme10n1: ios=588315/0, merge=0/0, ticks=62436/0, in_queue=62432, util=35.15% nvme1n1: ios=588327/0, merge=0/0, ticks=64432/0, in_queue=64564, util=36.10% nvme13n1: ios=588342/0, merge=0/0, ticks=65856/0, in_queue=65892, util=36.06% nvme4n1: ios=588343/0, merge=0/0, ticks=64528/0, in_queue=64752, util=35.73% nvme7n1: ios=589243/0, merge=0/0, ticks=63740/0, in_queue=63696, util=35.34% nvme0n1: ios=588499/0, merge=0/0, ticks=61308/0, in_queue=61268, util=34.83% nvme12n1: ios=588221/0, merge=0/0, ticks=62076/0, in_queue=61976, util=35.19% nvme3n1: ios=588512/0, merge=0/0, ticks=61880/0, in_queue=61824, util=35.09%
oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$


C)

oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$ sudo fio postgresql_storage_workload.fio randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
...
fio-2.1.11
Starting 32 threads
Jobs: 1 (f=0): [_(24),r(1),_(7)] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
randread: (groupid=0, jobs=32): err= 0: pid=2263: Mon Jan 23 19:30:49 2017
  read : io=977.76GB, bw=8343.4MB/s, iops=2135.9K, runt=120001msec
    slat (usec): min=0, max=3372, avg= 7.30, stdev=27.48
    clat (usec): min=1, max=21871, avg=997.26, stdev=1995.10
     lat (usec): min=4, max=21982, avg=1004.60, stdev=2010.61
    clat percentiles (usec):
     |  1.00th=[  374],  5.00th=[  378], 10.00th=[  378], 20.00th=[  386],
     | 30.00th=[  390], 40.00th=[  394], 50.00th=[  394], 60.00th=[  398],
     | 70.00th=[  406], 80.00th=[  540], 90.00th=[ 1496], 95.00th=[ 5408],
     | 99.00th=[10944], 99.50th=[12224], 99.90th=[14016], 99.95th=[14784],
     | 99.99th=[16512]
bw (KB /s): min= 0, max=1353208, per=5.91%, avg=505187.96, stdev=549388.79
    lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
    lat (usec) : 100=0.01%, 250=0.01%, 500=78.69%, 750=5.80%, 1000=2.94%
    lat (msec) : 2=3.84%, 4=2.66%, 10=4.52%, 20=1.56%, 50=0.01%
  cpu          : usr=3.09%, sys=68.19%, ctx=10916103, majf=0, minf=4128
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=256309234/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
READ: io=977.76GB, aggrb=8343.4MB/s, minb=8343.4MB/s, maxb=8343.4MB/s, mint=120001msec, maxt=120001msec

Disk stats (read/write):
md1: ios=10762806/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=675866/0, aggrmerge=0/0, aggrticks=70332/0, aggrin_queue=70505, aggrutil=28.65% nvme15n1: ios=675832/0, merge=0/0, ticks=69604/0, in_queue=69648, util=27.82% nvme6n1: ios=676181/0, merge=0/0, ticks=75584/0, in_queue=75552, util=28.65% nvme9n1: ios=675762/0, merge=0/0, ticks=67916/0, in_queue=68236, util=27.79% nvme11n1: ios=675745/0, merge=0/0, ticks=68296/0, in_queue=68804, util=27.66% nvme2n1: ios=676036/0, merge=0/0, ticks=70904/0, in_queue=71240, util=28.14% nvme14n1: ios=675737/0, merge=0/0, ticks=71560/0, in_queue=71716, util=28.13% nvme5n1: ios=676592/0, merge=0/0, ticks=71832/0, in_queue=71976, util=28.02% nvme8n1: ios=675969/0, merge=0/0, ticks=69152/0, in_queue=69192, util=27.63% nvme10n1: ios=675607/0, merge=0/0, ticks=67600/0, in_queue=67668, util=27.74% nvme1n1: ios=675528/0, merge=0/0, ticks=72856/0, in_queue=73136, util=28.48% nvme13n1: ios=675189/0, merge=0/0, ticks=69736/0, in_queue=70084, util=28.04% nvme4n1: ios=676117/0, merge=0/0, ticks=68120/0, in_queue=68600, util=27.88% nvme7n1: ios=675726/0, merge=0/0, ticks=72004/0, in_queue=71960, util=28.25% nvme0n1: ios=676119/0, merge=0/0, ticks=71228/0, in_queue=71264, util=28.12% nvme12n1: ios=675837/0, merge=0/0, ticks=70320/0, in_queue=70368, util=27.99% nvme3n1: ios=675887/0, merge=0/0, ticks=68600/0, in_queue=68636, util=27.95%
oberstet@svr-psql19:~/scm/parcit/RA/adr/system/docs$

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux