>>Allegedly this model ssd (128G m550) can do 75K 4k random write IOPS >>(running fio on the filesystem I've seen 70K IOPS so is reasonably >>believable). So anyway we are not getting anywhere near the max IOPS >>from our devices. Hi, Just check this: http://www.anandtech.com/show/7864/crucial-m550-review-128gb-256gb-512gb-and-1tb-models-tested/3 If the ssd is full of datas, the performance is far from 75K, more around 7K. I think only high-end DC ssd, slc, can provide consistent results around 40K-50K ----- Mail original ----- De: "Mark Kirkwood" <mark.kirkwood at catalyst.net.nz> ?: "Sebastien Han" <sebastien.han at enovance.com>, "ceph-users" <ceph-users at lists.ceph.com> Envoy?: Lundi 1 Septembre 2014 02:36:45 Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS On 31/08/14 17:55, Mark Kirkwood wrote: > On 29/08/14 22:17, Sebastien Han wrote: > >> @Mark thanks trying this :) >> Unfortunately using nobarrier and another dedicated SSD for the >> journal (plus your ceph setting) didn?t bring much, now I can reach >> 3,5K IOPS. >> By any chance, would it be possible for you to test with a single OSD >> SSD? >> > > Funny you should bring this up - I have just updated my home system with > a pair of Crucial m550. So figured I;d try a run with 2x ssd 1 for > journal and 1 for data and 1x ssd (journal + data). > > > The results were the opposite of what I expected (see below), with 2x > ssd getting about 6K IOPS and 1 x ssd getting 8K IOPS (wtf): > > I'm running this on Ubuntu 14.04 + ceph git master from a few days ago: > > $ ceph --version > ceph version 0.84-562-g8d40600 (8d406001d9b84d9809d181077c61ad9181934752) > > The data partition was created with: > > $ sudo mkfs.xfs -f -l lazy-count=1 /dev/sdd4 > > and mounted via: > > $ sudo mount -o nobarrier,allocsize=4096 /dev/sdd4 /ceph2 > > > I've attached my ceph.conf and the fio template FWIW. > > 2x Crucial m550 (1x journal, 1x data) > > rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, > iodepth=64 > fio-2.1.11-20-g9a44 > Starting 1 process > rbd_thread: (groupid=0, jobs=1): err= 0: pid=5511: Sun Aug 31 17:33:40 2014 > write: io=1024.0MB, bw=24694KB/s, iops=6173, runt= 42462msec > slat (usec): min=11, max=4086, avg=51.19, stdev=59.30 > clat (msec): min=3, max=24, avg= 9.99, stdev= 1.57 > lat (msec): min=3, max=24, avg=10.04, stdev= 1.57 > clat percentiles (usec): > | 1.00th=[ 6624], 5.00th=[ 7584], 10.00th=[ 8032], 20.00th=[ 8640], > | 30.00th=[ 9152], 40.00th=[ 9536], 50.00th=[ 9920], 60.00th=[10304], > | 70.00th=[10816], 80.00th=[11328], 90.00th=[11968], 95.00th=[12480], > | 99.00th=[13888], 99.50th=[14528], 99.90th=[17024], 99.95th=[19584], > | 99.99th=[23168] > bw (KB /s): min=23158, max=25592, per=100.00%, avg=24711.65, > stdev=470.72 > lat (msec) : 4=0.01%, 10=50.69%, 20=49.26%, 50=0.04% > cpu : usr=25.27%, sys=2.68%, ctx=266729, majf=0, minf=16773 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.3%, 32=83.8%, > >=64=15.8% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=93.8%, 8=2.9%, 16=2.2%, 32=1.0%, 64=0.1%, > >=64=0.0% > issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=64 > > 1x Crucial m550 (journal + data) > > rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, > iodepth=64 > fio-2.1.11-20-g9a44 > Starting 1 process > rbd_thread: (groupid=0, jobs=1): err= 0: pid=6887: Sun Aug 31 17:42:22 2014 > write: io=1024.0MB, bw=32778KB/s, iops=8194, runt= 31990msec > slat (usec): min=10, max=4016, avg=45.68, stdev=41.60 > clat (usec): min=428, max=25688, avg=7658.03, stdev=1600.65 > lat (usec): min=923, max=25757, avg=7703.72, stdev=1598.77 > clat percentiles (usec): > | 1.00th=[ 3440], 5.00th=[ 5216], 10.00th=[ 6048], 20.00th=[ 6624], > | 30.00th=[ 7008], 40.00th=[ 7328], 50.00th=[ 7584], 60.00th=[ 7904], > | 70.00th=[ 8256], 80.00th=[ 8640], 90.00th=[ 9280], 95.00th=[10048], > | 99.00th=[12864], 99.50th=[14528], 99.90th=[17536], 99.95th=[19328], > | 99.99th=[21888] > bw (KB /s): min=30768, max=35160, per=100.00%, avg=32907.35, > stdev=934.80 > lat (usec) : 500=0.01%, 1000=0.01% > lat (msec) : 2=0.04%, 4=1.80%, 10=93.15%, 20=4.97%, 50=0.04% > cpu : usr=32.32%, sys=3.05%, ctx=179657, majf=0, minf=16751 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=59.7%, > >=64=40.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=96.8%, 8=2.6%, 16=0.5%, 32=0.1%, 64=0.1%, > >=64=0.0% > issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=64 > > > > I'm digging a bit more to try to understand this slightly surprising result. For that last benchmark I'd used a file rather than a device journal on the same ssd: $ ls -l /ceph2 total 15360040 -rw-r--r-- 1 root root 37 Sep 1 12:00 ceph_fsid drwxr-xr-x 68 root root 4096 Sep 1 12:00 current -rw-r--r-- 1 root root 37 Sep 1 12:00 fsid -rw-r--r-- 1 root root 15728640000 Sep 1 12:00 journal -rw------- 1 root root 56 Sep 1 12:00 keyring -rw-r--r-- 1 root root 21 Sep 1 12:00 magic -rw-r--r-- 1 root root 6 Sep 1 12:00 ready -rw-r--r-- 1 root root 4 Sep 1 12:00 store_version -rw-r--r-- 1 root root 53 Sep 1 12:00 superblock -rw-r--r-- 1 root root 2 Sep 1 12:00 whoami Let's try a more standard device journal on another partition of the same ssd. 1x Crucial m550 (device journal + data): $ ls -l /ceph2 total 36 -rw-r--r-- 1 root root 37 Sep 1 12:02 ceph_fsid drwxr-xr-x 68 root root 4096 Sep 1 12:02 current -rw-r--r-- 1 root root 37 Sep 1 12:02 fsid lrwxrwxrwx 1 root root 9 Sep 1 12:02 journal -> /dev/sdd1 -rw------- 1 root root 56 Sep 1 12:02 keyring -rw-r--r-- 1 root root 21 Sep 1 12:02 magic -rw-r--r-- 1 root root 6 Sep 1 12:02 ready -rw-r--r-- 1 root root 4 Sep 1 12:02 store_version -rw-r--r-- 1 root root 53 Sep 1 12:02 superblock -rw-r--r-- 1 root root 2 Sep 1 12:02 whoami rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64 fio-2.1.11-20-g9a44 Starting 1 process rbd_thread: (groupid=0, jobs=1): err= 0: pid=4463: Mon Sep 1 09:16:16 2014 write: io=1024.0MB, bw=22105KB/s, iops=5526, runt= 47436msec slat (usec): min=11, max=4054, avg=52.66, stdev=62.79 clat (msec): min=3, max=43, avg=11.20, stdev= 1.69 lat (msec): min=4, max=43, avg=11.25, stdev= 1.69 clat percentiles (usec): | 1.00th=[ 7904], 5.00th=[ 8896], 10.00th=[ 9408], 20.00th=[10048], | 30.00th=[10432], 40.00th=[10688], 50.00th=[11072], 60.00th=[11456], | 70.00th=[11712], 80.00th=[12224], 90.00th=[12992], 95.00th=[13888], | 99.00th=[16768], 99.50th=[17792], 99.90th=[20352], 99.95th=[24960], | 99.99th=[42240] bw (KB /s): min=20285, max=23537, per=100.00%, avg=22126.98, stdev=579.19 lat (msec) : 4=0.01%, 10=20.03%, 20=79.86%, 50=0.11% cpu : usr=23.48%, sys=2.58%, ctx=302278, majf=0, minf=16786 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.6%, 32=82.8%, >=64=16.6% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=93.9%, 8=3.0%, 16=2.0%, 32=1.0%, 64=0.1%, >=64=0.0% issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=64 So we seem to lose performance a bit there. Finally let's use 2 ssd again but have a file journal only on the 2nd one. 2x Crucial m550 (1x file journal, 1x data): rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64 Starting 1 process fio-2.1.11-20-g9a44 rbd_thread: (groupid=0, jobs=1): err= 0: pid=6943: Mon Sep 1 11:18:01 2014 write: io=1024.0MB, bw=32248KB/s, iops=8062, runt= 32516msec slat (usec): min=11, max=4843, avg=45.42, stdev=43.57 clat (usec): min=657, max=22614, avg=7806.70, stdev=1319.02 lat (msec): min=1, max=22, avg= 7.85, stdev= 1.32 clat percentiles (usec): | 1.00th=[ 4384], 5.00th=[ 5984], 10.00th=[ 6432], 20.00th=[ 6880], | 30.00th=[ 7200], 40.00th=[ 7520], 50.00th=[ 7776], 60.00th=[ 8032], | 70.00th=[ 8384], 80.00th=[ 8640], 90.00th=[ 9152], 95.00th=[ 9664], | 99.00th=[11328], 99.50th=[13376], 99.90th=[17536], 99.95th=[18304], | 99.99th=[21376] bw (KB /s): min=30408, max=35320, per=100.00%, avg=32339.56, stdev=937.80 lat (usec) : 750=0.01% lat (msec) : 2=0.03%, 4=0.70%, 10=95.96%, 20=3.29%, 50=0.02% cpu : usr=31.37%, sys=3.42%, ctx=181872, majf=0, minf=16759 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=56.6%, >=64=43.3% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=97.1%, 8=2.4%, 16=0.4%, 32=0.1%, 64=0.1%, >=64=0.0% issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=64 So we are up to 8K IOPS again. Observe we are not maxing out the ssds: Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 5048.00 0.00 7550.00 0.00 83.43 22.63 2.80 0.37 0.00 0.37 0.04 31.60 sdc 0.00 0.00 0.00 7145.00 0.00 72.21 20.70 0.27 0.04 0.00 0.04 0.04 26.80 Allegedly this model ssd (128G m550) can do 75K 4k random write IOPS (running fio on the filesystem I've seen 70K IOPS so is reasonably believable). So anyway we are not getting anywhere near the max IOPS from our devices. We use the Intel S3700 for production ceph servers, so I'll see if we have any I can test on - would be interesting to see if I find the same 3.5K issue or not. Cheers Mark _______________________________________________ ceph-users mailing list ceph-users at lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com