[Single OSD performance on SSD] Can't go over 3, 2K IOPS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 31/08/14 17:55, Mark Kirkwood wrote:
> On 29/08/14 22:17, Sebastien Han wrote:
>
>> @Mark thanks trying this :)
>> Unfortunately using nobarrier and another dedicated SSD for the
>> journal  (plus your ceph setting) didn?t bring much, now I can reach
>> 3,5K IOPS.
>> By any chance, would it be possible for you to test with a single OSD
>> SSD?
>>
>
> Funny you should bring this up - I have just updated my home system with
> a pair of Crucial m550. So figured I;d try a run with 2x ssd 1 for
> journal and 1 for data and 1x ssd (journal + data).
>
>
> The results were the opposite of what I expected (see below), with 2x
> ssd getting about 6K IOPS and 1 x ssd getting 8K IOPS (wtf):
>
> I'm running this on Ubuntu 14.04 + ceph git master from a few days ago:
>
> $ ceph --version
> ceph version 0.84-562-g8d40600 (8d406001d9b84d9809d181077c61ad9181934752)
>
> The data partition was created with:
>
> $ sudo mkfs.xfs -f -l lazy-count=1 /dev/sdd4
>
> and mounted via:
>
> $ sudo mount -o nobarrier,allocsize=4096 /dev/sdd4 /ceph2
>
>
> I've attached my ceph.conf and the fio template FWIW.
>
> 2x Crucial m550 (1x journal, 1x data)
>
> rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
> iodepth=64
> fio-2.1.11-20-g9a44
> Starting 1 process
> rbd_thread: (groupid=0, jobs=1): err= 0: pid=5511: Sun Aug 31 17:33:40 2014
>    write: io=1024.0MB, bw=24694KB/s, iops=6173, runt= 42462msec
>      slat (usec): min=11, max=4086, avg=51.19, stdev=59.30
>      clat (msec): min=3, max=24, avg= 9.99, stdev= 1.57
>       lat (msec): min=3, max=24, avg=10.04, stdev= 1.57
>      clat percentiles (usec):
>       |  1.00th=[ 6624],  5.00th=[ 7584], 10.00th=[ 8032], 20.00th=[ 8640],
>       | 30.00th=[ 9152], 40.00th=[ 9536], 50.00th=[ 9920], 60.00th=[10304],
>       | 70.00th=[10816], 80.00th=[11328], 90.00th=[11968], 95.00th=[12480],
>       | 99.00th=[13888], 99.50th=[14528], 99.90th=[17024], 99.95th=[19584],
>       | 99.99th=[23168]
>      bw (KB  /s): min=23158, max=25592, per=100.00%, avg=24711.65,
> stdev=470.72
>      lat (msec) : 4=0.01%, 10=50.69%, 20=49.26%, 50=0.04%
>    cpu          : usr=25.27%, sys=2.68%, ctx=266729, majf=0, minf=16773
>    IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.3%, 32=83.8%,
>  >=64=15.8%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>  >=64=0.0%
>       complete  : 0=0.0%, 4=93.8%, 8=2.9%, 16=2.2%, 32=1.0%, 64=0.1%,
>  >=64=0.0%
>       issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
>       latency   : target=0, window=0, percentile=100.00%, depth=64
>
> 1x Crucial m550 (journal + data)
>
> rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
> iodepth=64
> fio-2.1.11-20-g9a44
> Starting 1 process
> rbd_thread: (groupid=0, jobs=1): err= 0: pid=6887: Sun Aug 31 17:42:22 2014
>    write: io=1024.0MB, bw=32778KB/s, iops=8194, runt= 31990msec
>      slat (usec): min=10, max=4016, avg=45.68, stdev=41.60
>      clat (usec): min=428, max=25688, avg=7658.03, stdev=1600.65
>       lat (usec): min=923, max=25757, avg=7703.72, stdev=1598.77
>      clat percentiles (usec):
>       |  1.00th=[ 3440],  5.00th=[ 5216], 10.00th=[ 6048], 20.00th=[ 6624],
>       | 30.00th=[ 7008], 40.00th=[ 7328], 50.00th=[ 7584], 60.00th=[ 7904],
>       | 70.00th=[ 8256], 80.00th=[ 8640], 90.00th=[ 9280], 95.00th=[10048],
>       | 99.00th=[12864], 99.50th=[14528], 99.90th=[17536], 99.95th=[19328],
>       | 99.99th=[21888]
>      bw (KB  /s): min=30768, max=35160, per=100.00%, avg=32907.35,
> stdev=934.80
>      lat (usec) : 500=0.01%, 1000=0.01%
>      lat (msec) : 2=0.04%, 4=1.80%, 10=93.15%, 20=4.97%, 50=0.04%
>    cpu          : usr=32.32%, sys=3.05%, ctx=179657, majf=0, minf=16751
>    IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=59.7%,
>  >=64=40.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>  >=64=0.0%
>       complete  : 0=0.0%, 4=96.8%, 8=2.6%, 16=0.5%, 32=0.1%, 64=0.1%,
>  >=64=0.0%
>       issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
>       latency   : target=0, window=0, percentile=100.00%, depth=64
>
>
>
>

I'm digging a bit more to try to understand this slightly surprising result.

For that last benchmark I'd used a file rather than a device journal on 
the same ssd:

$ ls -l /ceph2
total 15360040
-rw-r--r--  1 root root          37 Sep  1 12:00 ceph_fsid
drwxr-xr-x 68 root root        4096 Sep  1 12:00 current
-rw-r--r--  1 root root          37 Sep  1 12:00 fsid
-rw-r--r--  1 root root 15728640000 Sep  1 12:00 journal
-rw-------  1 root root          56 Sep  1 12:00 keyring
-rw-r--r--  1 root root          21 Sep  1 12:00 magic
-rw-r--r--  1 root root           6 Sep  1 12:00 ready
-rw-r--r--  1 root root           4 Sep  1 12:00 store_version
-rw-r--r--  1 root root          53 Sep  1 12:00 superblock
-rw-r--r--  1 root root           2 Sep  1 12:00 whoami


Let's try a more standard device journal on another partition of the 
same ssd.  1x Crucial m550 (device journal + data):

$ ls -l /ceph2
total 36
-rw-r--r--  1 root root   37 Sep  1 12:02 ceph_fsid
drwxr-xr-x 68 root root 4096 Sep  1 12:02 current
-rw-r--r--  1 root root   37 Sep  1 12:02 fsid
lrwxrwxrwx  1 root root    9 Sep  1 12:02 journal -> /dev/sdd1
-rw-------  1 root root   56 Sep  1 12:02 keyring
-rw-r--r--  1 root root   21 Sep  1 12:02 magic
-rw-r--r--  1 root root    6 Sep  1 12:02 ready
-rw-r--r--  1 root root    4 Sep  1 12:02 store_version
-rw-r--r--  1 root root   53 Sep  1 12:02 superblock
-rw-r--r--  1 root root    2 Sep  1 12:02 whoami


rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
iodepth=64
fio-2.1.11-20-g9a44
Starting 1 process
rbd_thread: (groupid=0, jobs=1): err= 0: pid=4463: Mon Sep  1 09:16:16 2014
   write: io=1024.0MB, bw=22105KB/s, iops=5526, runt= 47436msec
     slat (usec): min=11, max=4054, avg=52.66, stdev=62.79
     clat (msec): min=3, max=43, avg=11.20, stdev= 1.69
      lat (msec): min=4, max=43, avg=11.25, stdev= 1.69
     clat percentiles (usec):
      |  1.00th=[ 7904],  5.00th=[ 8896], 10.00th=[ 9408], 20.00th=[10048],
      | 30.00th=[10432], 40.00th=[10688], 50.00th=[11072], 60.00th=[11456],
      | 70.00th=[11712], 80.00th=[12224], 90.00th=[12992], 95.00th=[13888],
      | 99.00th=[16768], 99.50th=[17792], 99.90th=[20352], 99.95th=[24960],
      | 99.99th=[42240]
     bw (KB  /s): min=20285, max=23537, per=100.00%, avg=22126.98, 
stdev=579.19
     lat (msec) : 4=0.01%, 10=20.03%, 20=79.86%, 50=0.11%
   cpu          : usr=23.48%, sys=2.58%, ctx=302278, majf=0, minf=16786
   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.6%, 32=82.8%, 
 >=64=16.6%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=93.9%, 8=3.0%, 16=2.0%, 32=1.0%, 64=0.1%, 
 >=64=0.0%
      issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
      latency   : target=0, window=0, percentile=100.00%, depth=64

So we seem to lose performance a bit there. Finally let's use 2 ssd 
again but have a file journal only on the 2nd one. 2x Crucial m550 (1x 
file journal, 1x data):

rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
iodepth=64
Starting 1 process
fio-2.1.11-20-g9a44

rbd_thread: (groupid=0, jobs=1): err= 0: pid=6943: Mon Sep  1 11:18:01 2014
   write: io=1024.0MB, bw=32248KB/s, iops=8062, runt= 32516msec
     slat (usec): min=11, max=4843, avg=45.42, stdev=43.57
     clat (usec): min=657, max=22614, avg=7806.70, stdev=1319.02
      lat (msec): min=1, max=22, avg= 7.85, stdev= 1.32
     clat percentiles (usec):
      |  1.00th=[ 4384],  5.00th=[ 5984], 10.00th=[ 6432], 20.00th=[ 6880],
      | 30.00th=[ 7200], 40.00th=[ 7520], 50.00th=[ 7776], 60.00th=[ 8032],
      | 70.00th=[ 8384], 80.00th=[ 8640], 90.00th=[ 9152], 95.00th=[ 9664],
      | 99.00th=[11328], 99.50th=[13376], 99.90th=[17536], 99.95th=[18304],
      | 99.99th=[21376]
     bw (KB  /s): min=30408, max=35320, per=100.00%, avg=32339.56, 
stdev=937.80
     lat (usec) : 750=0.01%
     lat (msec) : 2=0.03%, 4=0.70%, 10=95.96%, 20=3.29%, 50=0.02%
   cpu          : usr=31.37%, sys=3.42%, ctx=181872, majf=0, minf=16759
   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=56.6%, 
 >=64=43.3%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=97.1%, 8=2.4%, 16=0.4%, 32=0.1%, 64=0.1%, 
 >=64=0.0%
      issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
      latency   : target=0, window=0, percentile=100.00%, depth=64

So we are up to 8K IOPS again. Observe we are not maxing out the ssds:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00 
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00 
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00  5048.00    0.00 7550.00     0.00    83.43 
22.63     2.80    0.37    0.00    0.37   0.04  31.60
sdc               0.00     0.00    0.00 7145.00     0.00    72.21 
20.70     0.27    0.04    0.00    0.04   0.04  26.80

Allegedly this model ssd (128G m550) can do 75K 4k random write IOPS 
(running fio on the filesystem I've seen 70K IOPS so is reasonably 
believable). So anyway we are not getting anywhere near the max IOPS 
from our devices.

We use the Intel S3700 for production ceph servers, so I'll see if we 
have any I can test on - would be interesting to see if I find the same 
3.5K issue or not.

Cheers

Mark




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux