Re: rbd_cache, limiting read on high iops around 40k

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Mon, 22 Jun 2015 11:28:03 +0200

Oh so it only works for virtio disks? I'm using scsi with the virtio PCI controller.

Stefan
Excuse my typo sent from my mobile phone.

Am 22.06.2015 um 11:26 schrieb Alexandre DERUMIER <aderumier@xxxxxxxxx>:

In proxmox 3.4 will it be possible to add at least in the configuration file? Or it entails a change in the source code KVM? 
Thanks. 

This small patch on top of qemu-server should be enough (I think it should apply on 3.4 sources without problem)

https://git.proxmox.com/?p=qemu-server.git;a=commit;h=51f492cd6da0228129aaab1393b5c5844d75a53c

No need to hack qemu-kvm

----- Mail original -----
De: "Irek Fasikhov" <malmyzh@xxxxxxxxx>
À: "aderumier" <aderumier@xxxxxxxxx>
Cc: "Stefan Priebe" <s.priebe@xxxxxxxxxxxx>, "pushpesh sharma" <pushpesh.eck@xxxxxxxxx>, "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx>, "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Envoyé: Lundi 22 Juin 2015 11:04:42
Objet: Re: rbd_cache, limiting read on high iops around 40k

| Proxmox 4.0 will allow to enable|disable 1 iothread by disk. 
Alexandre, Useful option! 
In proxmox 3.4 will it be possible to add at least in the configuration file? Or it entails a change in the source code KVM? 
Thanks. 

2015-06-22 11:54 GMT+03:00 Alexandre DERUMIER < aderumier@xxxxxxxxx > : 

It is already possible to do in proxmox 3.4 (with the latest updates qemu-kvm 2.2.x). But it is necessary to register in the conf file iothread:1. For single drives the ambiguous behavior of productivity. 

Yes and no ;) 

Currently in proxmox 3.4, iothread:1 generate only 1 iothread for all disks. 

So, you'll have a small extra boost, but it'll not scale with multiple disks. 

Proxmox 4.0 will allow to enable|disable 1 iothread by disk. 

Does it also help for single disks or only multiple disks? 

Iothread can also help for single disk, because by default qemu use a main thread for disk but also other things(don't remember what exactly) 

----- Mail original ----- 
De: "Irek Fasikhov" < malmyzh@xxxxxxxxx > 
À: "Stefan Priebe" < s.priebe@xxxxxxxxxxxx > 
Cc: "aderumier" < aderumier@xxxxxxxxx >, "pushpesh sharma" < pushpesh.eck@xxxxxxxxx >, "Somnath Roy" < Somnath.Roy@xxxxxxxxxxx >, "ceph-devel" < ceph-devel@xxxxxxxxxxxxxxx >, "ceph-users" < ceph-users@xxxxxxxxxxxxxx > 
Envoyé: Lundi 22 Juin 2015 09:22:13 
Objet: Re: rbd_cache, limiting read on high iops around 40k 

It is already possible to do in proxmox 3.4 (with the latest updates qemu-kvm 2.2.x). But it is necessary to register in the conf file iothread:1. For single drives the ambiguous behavior of productivity. 

2015-06-22 10:12 GMT+03:00 Stefan Priebe - Profihost AG < s.priebe@xxxxxxxxxxxx > : 

Am 22.06.2015 um 09:08 schrieb Alexandre DERUMIER < aderumier@xxxxxxxxx >: 

Just an update, there seems to be no proper way to pass iothread 
parameter from openstack-nova (not at least in Juno release). So a 
default single iothread per VM is what all we have. So in conclusion a 
nova instance max iops on ceph rbd will be limited to 30-40K. 

Thanks for the update. 

For proxmox users, 

I have added iothread option to gui for proxmox 4.0 

Can we make iothread the default? Does it also help for single disks or only multiple disks? 

and added jemalloc as default memory allocator 

I have also send a jemmaloc patch to qemu dev mailing 
https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg05265.html 

(Help is welcome to push it in qemu upstream ! ) 

----- Mail original ----- 
De: "pushpesh sharma" < pushpesh.eck@xxxxxxxxx > 
À: "aderumier" < aderumier@xxxxxxxxx > 
Cc: "Somnath Roy" < Somnath.Roy@xxxxxxxxxxx >, "Irek Fasikhov" < malmyzh@xxxxxxxxx >, "ceph-devel" < ceph-devel@xxxxxxxxxxxxxxx >, "ceph-users" < ceph-users@xxxxxxxxxxxxxx > 
Envoyé: Lundi 22 Juin 2015 07:58:47 
Objet: Re: rbd_cache, limiting read on high iops around 40k 

Just an update, there seems to be no proper way to pass iothread 
parameter from openstack-nova (not at least in Juno release). So a 
default single iothread per VM is what all we have. So in conclusion a 
nova instance max iops on ceph rbd will be limited to 30-40K. 

On Tue, Jun 16, 2015 at 10:08 PM, Alexandre DERUMIER 
< aderumier@xxxxxxxxx > wrote: 
Hi, 

some news about qemu with tcmalloc vs jemmaloc. 

I'm testing with multiple disks (with iothreads) in 1 qemu guest. 

And if tcmalloc is a little faster than jemmaloc, 

I have hit a lot of time the tcmalloc::ThreadCache::ReleaseToCentralCache bug. 

increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help. 

with multiple disk, I'm around 200k iops with tcmalloc (before hitting the bug) and 350kiops with jemmaloc. 

The problem is that when I hit malloc bug, I'm around 4000-10000 iops, and only way to fix is is to restart qemu ... 

----- Mail original ----- 
De: "pushpesh sharma" < pushpesh.eck@xxxxxxxxx > 
À: "aderumier" < aderumier@xxxxxxxxx > 
Cc: "Somnath Roy" < Somnath.Roy@xxxxxxxxxxx >, "Irek Fasikhov" < malmyzh@xxxxxxxxx >, "ceph-devel" < ceph-devel@xxxxxxxxxxxxxxx >, "ceph-users" < ceph-users@xxxxxxxxxxxxxx > 
Envoyé: Vendredi 12 Juin 2015 08:58:21 
Objet: Re: rbd_cache, limiting read on high iops around 40k 

Thanks, posted the question in openstack list. Hopefully will get some 
expert opinion. 

On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER 
< aderumier@xxxxxxxxx > wrote: 
Hi, 

here a libvirt xml sample from libvirt src 

(you need to define <iothreads> number, then assign then in disks). 

I don't use openstack, so I really don't known how it's working with it. 

<domain type='qemu'> 
<name>QEMUGuest1</name> 
<uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> 
<memory unit='KiB'>219136</memory> 
<currentMemory unit='KiB'>219136</currentMemory> 
<vcpu placement='static'>2</vcpu> 
<iothreads>2</iothreads> 
<os> 
<type arch='i686' machine='pc'>hvm</type> 
<boot dev='hd'/> 
</os> 
<clock offset='utc'/> 
<on_poweroff>destroy</on_poweroff> 
<on_reboot>restart</on_reboot> 
<on_crash>destroy</on_crash> 
<devices> 
<emulator>/usr/bin/qemu</emulator> 
<disk type='file' device='disk'> 
<driver name='qemu' type='raw' iothread='1'/> 
<source file='/var/lib/libvirt/images/iothrtest1.img'/> 
<target dev='vdb' bus='virtio'/> 
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> 
</disk> 
<disk type='file' device='disk'> 
<driver name='qemu' type='raw' iothread='2'/> 
<source file='/var/lib/libvirt/images/iothrtest2.img'/> 
<target dev='vdc' bus='virtio'/> 
</disk> 
<controller type='usb' index='0'/> 
<controller type='ide' index='0'/> 
<controller type='pci' index='0' model='pci-root'/> 
<memballoon model='none'/> 
</devices> 
</domain> 

----- Mail original ----- 
De: "pushpesh sharma" < pushpesh.eck@xxxxxxxxx > 
À: "aderumier" < aderumier@xxxxxxxxx > 
Cc: "Somnath Roy" < Somnath.Roy@xxxxxxxxxxx >, "Irek Fasikhov" < malmyzh@xxxxxxxxx >, "ceph-devel" < ceph-devel@xxxxxxxxxxxxxxx >, "ceph-users" < ceph-users@xxxxxxxxxxxxxx > 
Envoyé: Vendredi 12 Juin 2015 07:52:41 
Objet: Re: rbd_cache, limiting read on high iops around 40k 

Hi Alexandre, 

I agree with your rational, of one iothread per disk. CPU consumed in 
IOwait is pretty high in each VM. But I am not finding a way to set 
the same on a nova instance. I am using openstack Juno with QEMU+KVM. 
As per libvirt documentation for setting iothreads, I can edit 
domain.xml directly and achieve the same effect. However in as in 
openstack env domain xml is created by nova with some additional 
metadata, so editing the domain xml using 'virsh edit' does not seems 
to work(I agree, it is not a very cloud way of doing things, but a 
hack). Changes made there vanish after saving them, due to reason 
libvirt validation fails on the same. 

#virsh dumpxml instance-000000c5 > vm.xml 
#virt-xml-validate vm.xml 
Relax-NG validity error : Extra element cpu in interleave 
vm.xml:1: element domain: Relax-NG validity error : Element domain 
failed to validate content 
vm.xml fails to validate 

Second approach I took was to setting QoS in volumes types. But there 
is no option to set iothreads per volume, there are parameter realted 
to max_read/wrirte ops/bytes. 

Thirdly, editing Nova flavor and proving extra specs like 
hw:cpu_socket/thread/core, can change guest CPU topology however again 
no way to set iothread. It does accept hw_disk_iothreads(no type check 
in place, i believe ), but can not pass the same in domain.xml. 

Could you suggest me a way to set the same. 

-Pushpesh 

On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER 
< aderumier@xxxxxxxxx > wrote: 
I need to try out the performance on qemu soon and may come back to you if I need some qemu setting trick :-) 

Sure no problem. 

(BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 iothread by disk) 

----- Mail original ----- 
De: "Somnath Roy" < Somnath.Roy@xxxxxxxxxxx > 
À: "aderumier" < aderumier@xxxxxxxxx >, "Irek Fasikhov" < malmyzh@xxxxxxxxx > 
Cc: "ceph-devel" < ceph-devel@xxxxxxxxxxxxxxx >, "pushpesh sharma" < pushpesh.eck@xxxxxxxxx >, "ceph-users" < ceph-users@xxxxxxxxxxxxxx > 
Envoyé: Mercredi 10 Juin 2015 09:06:32 
Objet: RE: rbd_cache, limiting read on high iops around 40k 

Hi Alexandre, 
Thanks for sharing the data. 
I need to try out the performance on qemu soon and may come back to you if I need some qemu setting trick :-) 

Regards 
Somnath 

-----Original Message----- 
From: ceph-users [mailto: ceph-users-bounces@xxxxxxxxxxxxxx ] On Behalf Of Alexandre DERUMIER 
Sent: Tuesday, June 09, 2015 10:42 PM 
To: Irek Fasikhov 
Cc: ceph-devel; pushpesh sharma; ceph-users 
Subject: Re:  rbd_cache, limiting read on high iops around 40k 

Very good work! 
Do you have a rpm-file? 
Thanks. 
no sorry, I'm have compiled it manually (and I'm using debian jessie as client) 

----- Mail original ----- 
De: "Irek Fasikhov" < malmyzh@xxxxxxxxx > 
À: "aderumier" < aderumier@xxxxxxxxx > 
Cc: "Robert LeBlanc" < robert@xxxxxxxxxxxxx >, "ceph-devel" < ceph-devel@xxxxxxxxxxxxxxx >, "pushpesh sharma" < pushpesh.eck@xxxxxxxxx >, "ceph-users" < ceph-users@xxxxxxxxxxxxxx > 
Envoyé: Mercredi 10 Juin 2015 07:21:42 
Objet: Re:  rbd_cache, limiting read on high iops around 40k 

Hi, Alexandre. 

Very good work! 
Do you have a rpm-file? 
Thanks. 

2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < aderumier@xxxxxxxxx > : 

Hi, 

I have tested qemu with last tcmalloc 2.4, and the improvement is huge with iothread: 50k iops (+45%) ! 

qemu : no iothread : glibc : iops=33395 qemu : no-iothread : tcmalloc (2.2.1) : iops=34516 (+3%) qemu : no-iothread : jemmaloc : iops=42226 (+26%) qemu : no-iothread : tcmalloc (2.4) : iops=35974 (+7%) 

qemu : iothread : glibc : iops=34516 
qemu : iothread : tcmalloc : iops=38676 (+12%) qemu : iothread : jemmaloc : iops=28023 (-19%) qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 

qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
------------------------------------------------------ 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 
fio-2.1.11 
Starting 1 process 
Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops] [eta 00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10 05:54:24 2015 read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 26070msec slat (usec): min=1, max=1136, avg= 3.54, stdev= 3.58 clat (usec): min=128, max=6262, avg=631.41, stdev=197.71 lat (usec): min=149, max=6265, avg=635.27, stdev=197.40 clat percentiles (usec): 
| 1.00th=[ 318], 5.00th=[ 378], 10.00th=[ 418], 20.00th=[ 474], 
| 30.00th=[ 516], 40.00th=[ 564], 50.00th=[ 612], 60.00th=[ 652], 
| 70.00th=[ 700], 80.00th=[ 756], 90.00th=[ 860], 95.00th=[ 980], 
| 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896], 
| 99.99th=[ 3760] 
bw (KB /s): min=145608, max=249688, per=100.00%, avg=201108.00, stdev=21718.87 lat (usec) : 250=0.04%, 500=25.84%, 750=53.00%, 1000=16.63% lat (msec) : 2=4.46%, 4=0.03%, 10=0.01% cpu : usr=9.73%, sys=24.93%, ctx=66417, majf=0, minf=38 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=5120.0MB, aggrb=201107KB/s, minb=201107KB/s, maxb=201107KB/s, mint=26070msec, maxt=26070msec 

Disk stats (read/write): 
vdb: ios=1302555/0, merge=0/0, ticks=715176/0, in_queue=714840, util=99.73% 

rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 
fio-2.1.11 
Starting 1 process 
Jobs: 1 (f=1): [r(1)] [100.0% done] [158.7MB/0KB/0KB /s] [40.6K/0/0 iops] [eta 00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=889: Wed Jun 10 06:05:06 2015 read : io=5120.0MB, bw=143897KB/s, iops=35974, runt= 36435msec slat (usec): min=1, max=710, avg= 3.31, stdev= 3.35 clat (usec): min=191, max=4740, avg=884.66, stdev=315.65 lat (usec): min=289, max=4743, avg=888.31, stdev=315.51 clat percentiles (usec): 
| 1.00th=[ 462], 5.00th=[ 516], 10.00th=[ 548], 20.00th=[ 596], 
| 30.00th=[ 652], 40.00th=[ 764], 50.00th=[ 868], 60.00th=[ 940], 
| 70.00th=[ 1004], 80.00th=[ 1096], 90.00th=[ 1256], 95.00th=[ 1416], 
| 99.00th=[ 2024], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2640], 
| 99.99th=[ 3632] 
bw (KB /s): min=98352, max=177328, per=99.91%, avg=143772.11, stdev=21782.39 lat (usec) : 250=0.01%, 500=3.48%, 750=35.69%, 1000=30.01% lat (msec) : 2=29.74%, 4=1.07%, 10=0.01% cpu : usr=7.10%, sys=16.90%, ctx=54855, majf=0, minf=38 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=5120.0MB, aggrb=143896KB/s, minb=143896KB/s, maxb=143896KB/s, mint=36435msec, maxt=36435msec 

Disk stats (read/write): 
vdb: ios=1301357/0, merge=0/0, ticks=1033036/0, in_queue=1032716, util=99.85% 

----- Mail original ----- 
De: "aderumier" < aderumier@xxxxxxxxx > 
À: "Robert LeBlanc" < robert@xxxxxxxxxxxxx > 
Cc: "Mark Nelson" < mnelson@xxxxxxxxxx >, "ceph-devel" < ceph-devel@xxxxxxxxxxxxxxx >, "pushpesh sharma" < pushpesh.eck@xxxxxxxxx >, "ceph-users" < ceph-users@xxxxxxxxxxxxxx > 
Envoyé: Mardi 9 Juin 2015 18:47:27 
Objet: Re:  rbd_cache, limiting read on high iops around 40k 

Hi Robert, 

What I found was that Ceph OSDs performed well with either tcmalloc or 
jemalloc (except when RocksDB was built with jemalloc instead of 
tcmalloc, I'm still working to dig into why that might be the case). 
yes,from my test, for osd tcmalloc is a little faster (but very little) than jemalloc. 

However, I found that tcmalloc with QEMU/KVM was very detrimental to 
small I/O, but provided huge gains in I/O >=1MB. Jemalloc was much 
better for QEMU/KVM in the tests that we ran. [1] 

Just have done qemu test (4k randread - rbd_cache=off), I don't see speed regression with tcmalloc. 
with qemu iothread, tcmalloc have a speed increase over glib 
with qemu iothread, jemalloc have a speed decrease 

without iothread, jemalloc have a big speed increase 

this is with 
-qemu 2.3 
-tcmalloc 2.2.1 
-jemmaloc 3.6 
-libc6 2.19 

qemu : no iothread : glibc : iops=33395 
qemu : no-iothread : tcmalloc : iops=34516 (+3%) 
qemu : no-iothread : jemmaloc : iops=42226 (+26%) 

qemu : iothread : glibc : iops=34516 
qemu : iothread : tcmalloc : iops=38676 (+12%) 
qemu : iothread : jemmaloc : iops=28023 (-19%) 

(The benefit of iothreads is that we can scale with more disks in 1vm) 

fio results: 
------------ 

qemu : iothread : tcmalloc : iops=38676 
----------------------------------------- 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 
fio-2.1.11 
Starting 1 process 
Jobs: 1 (f=0): [r(1)] [100.0% done] [123.5MB/0KB/0KB /s] [31.6K/0/0 iops] [eta 00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=1265: Tue Jun 9 18:16:53 2015 
read : io=5120.0MB, bw=154707KB/s, iops=38676, runt= 33889msec 
slat (usec): min=1, max=715, avg= 3.63, stdev= 3.42 
clat (usec): min=152, max=5736, avg=822.12, stdev=289.34 
lat (usec): min=231, max=5740, avg=826.10, stdev=289.08 
clat percentiles (usec): 
| 1.00th=[ 402], 5.00th=[ 466], 10.00th=[ 510], 20.00th=[ 572], 
| 30.00th=[ 636], 40.00th=[ 716], 50.00th=[ 780], 60.00th=[ 852], 
| 70.00th=[ 932], 80.00th=[ 1020], 90.00th=[ 1160], 95.00th=[ 1352], 
| 99.00th=[ 1800], 99.50th=[ 1944], 99.90th=[ 2256], 99.95th=[ 2448], 
| 99.99th=[ 3888] 
bw (KB /s): min=123888, max=198584, per=100.00%, avg=154824.40, stdev=16978.03 
lat (usec) : 250=0.01%, 500=8.91%, 750=36.44%, 1000=32.63% 
lat (msec) : 2=21.65%, 4=0.37%, 10=0.01% 
cpu : usr=8.29%, sys=19.76%, ctx=55882, majf=0, minf=39 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=5120.0MB, aggrb=154707KB/s, minb=154707KB/s, maxb=154707KB/s, mint=33889msec, maxt=33889msec 

Disk stats (read/write): 
vdb: ios=1302739/0, merge=0/0, ticks=934444/0, in_queue=934096, util=99.77% 

qemu : no-iothread : tcmalloc : iops=34516 
--------------------------------------------- 
Jobs: 1 (f=1): [r(1)] [100.0% done] [163.2MB/0KB/0KB /s] [41.8K/0/0 iops] [eta 00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=896: Tue Jun 9 18:19:08 2015 
read : io=5120.0MB, bw=138065KB/s, iops=34516, runt= 37974msec 
slat (usec): min=1, max=708, avg= 3.98, stdev= 3.57 
clat (usec): min=208, max=11858, avg=921.43, stdev=333.61 
lat (usec): min=266, max=11862, avg=925.77, stdev=333.40 
clat percentiles (usec): 
| 1.00th=[ 434], 5.00th=[ 510], 10.00th=[ 564], 20.00th=[ 652], 
| 30.00th=[ 732], 40.00th=[ 812], 50.00th=[ 876], 60.00th=[ 940], 
| 70.00th=[ 1020], 80.00th=[ 1112], 90.00th=[ 1320], 95.00th=[ 1576], 
| 99.00th=[ 1992], 99.50th=[ 2128], 99.90th=[ 2736], 99.95th=[ 3248], 
| 99.99th=[ 4320] 
bw (KB /s): min=77312, max=185576, per=99.74%, avg=137709.88, stdev=16883.77 
lat (usec) : 250=0.01%, 500=4.36%, 750=27.61%, 1000=35.60% 
lat (msec) : 2=31.49%, 4=0.92%, 10=0.02%, 20=0.01% 
cpu : usr=7.19%, sys=19.52%, ctx=55903, majf=0, minf=38 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=5120.0MB, aggrb=138064KB/s, minb=138064KB/s, maxb=138064KB/s, mint=37974msec, maxt=37974msec 

Disk stats (read/write): 
vdb: ios=1309902/0, merge=0/0, ticks=1068768/0, in_queue=1068396, util=99.86% 

qemu : iothread : glibc : iops=34516 
------------------------------------- 

rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 
fio-2.1.11 
Starting 1 process 
Jobs: 1 (f=1): [r(1)] [100.0% done] [133.4MB/0KB/0KB /s] [34.2K/0/0 iops] [eta 00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=876: Tue Jun 9 18:24:01 2015 
read : io=5120.0MB, bw=137786KB/s, iops=34446, runt= 38051msec 
slat (usec): min=1, max=496, avg= 3.88, stdev= 3.66 
clat (usec): min=283, max=7515, avg=923.34, stdev=300.28 
lat (usec): min=286, max=7519, avg=927.58, stdev=300.02 
clat percentiles (usec): 
| 1.00th=[ 506], 5.00th=[ 564], 10.00th=[ 596], 20.00th=[ 652], 
| 30.00th=[ 724], 40.00th=[ 804], 50.00th=[ 884], 60.00th=[ 964], 
| 70.00th=[ 1048], 80.00th=[ 1144], 90.00th=[ 1304], 95.00th=[ 1448], 
| 99.00th=[ 1896], 99.50th=[ 2096], 99.90th=[ 2480], 99.95th=[ 2640], 
| 99.99th=[ 3984] 
bw (KB /s): min=102680, max=171112, per=100.00%, avg=137877.78, stdev=15521.30 
lat (usec) : 500=0.84%, 750=32.97%, 1000=30.82% 
lat (msec) : 2=34.65%, 4=0.71%, 10=0.01% 
cpu : usr=7.42%, sys=19.47%, ctx=52455, majf=0, minf=38 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=5120.0MB, aggrb=137785KB/s, minb=137785KB/s, maxb=137785KB/s, mint=38051msec, maxt=38051msec 

Disk stats (read/write): 
vdb: ios=1307426/0, merge=0/0, ticks=1051416/0, in_queue=1050972, util=99.85% 

qemu : no iothread : glibc : iops=33395 
----------------------------------------- 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 
fio-2.1.11 
Starting 1 process 
Jobs: 1 (f=1): [r(1)] [100.0% done] [125.4MB/0KB/0KB /s] [32.9K/0/0 iops] [eta 00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=886: Tue Jun 9 18:27:18 2015 
read : io=5120.0MB, bw=133583KB/s, iops=33395, runt= 39248msec 
slat (usec): min=1, max=1054, avg= 3.86, stdev= 4.29 
clat (usec): min=139, max=12635, avg=952.85, stdev=335.51 
lat (usec): min=303, max=12638, avg=957.01, stdev=335.29 
clat percentiles (usec): 
| 1.00th=[ 516], 5.00th=[ 564], 10.00th=[ 596], 20.00th=[ 652], 
| 30.00th=[ 724], 40.00th=[ 820], 50.00th=[ 924], 60.00th=[ 996], 
| 70.00th=[ 1080], 80.00th=[ 1176], 90.00th=[ 1336], 95.00th=[ 1528], 
| 99.00th=[ 2096], 99.50th=[ 2320], 99.90th=[ 2672], 99.95th=[ 2928], 
| 99.99th=[ 4832] 
bw (KB /s): min=98136, max=171624, per=100.00%, avg=133682.64, stdev=19121.91 
lat (usec) : 250=0.01%, 500=0.57%, 750=32.57%, 1000=26.98% 
lat (msec) : 2=38.59%, 4=1.28%, 10=0.01%, 20=0.01% 
cpu : usr=9.24%, sys=15.92%, ctx=51219, majf=0, minf=38 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=5120.0MB, aggrb=133583KB/s, minb=133583KB/s, maxb=133583KB/s, mint=39248msec, maxt=39248msec 

Disk stats (read/write): 
vdb: ios=1304526/0, merge=0/0, ticks=1075020/0, in_queue=1074536, util=99.84% 

qemu : iothread : jemmaloc : iops=28023 
---------------------------------------- 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 
fio-2.1.11 
Starting 1 process 
Jobs: 1 (f=1): [r(1)] [97.9% done] [155.2MB/0KB/0KB /s] [39.1K/0/0 iops] [eta 00m:01s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=899: Tue Jun 9 18:30:26 2015 
read : io=5120.0MB, bw=112094KB/s, iops=28023, runt= 46772msec 
slat (usec): min=1, max=467, avg= 4.33, stdev= 4.77 
clat (usec): min=253, max=11307, avg=1135.63, stdev=346.55 
lat (usec): min=256, max=11309, avg=1140.39, stdev=346.22 
clat percentiles (usec): 
| 1.00th=[ 510], 5.00th=[ 628], 10.00th=[ 700], 20.00th=[ 820], 
| 30.00th=[ 924], 40.00th=[ 1032], 50.00th=[ 1128], 60.00th=[ 1224], 
| 70.00th=[ 1320], 80.00th=[ 1416], 90.00th=[ 1560], 95.00th=[ 1688], 
| 99.00th=[ 2096], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2832], 
| 99.99th=[ 3760] 
bw (KB /s): min=91792, max=174416, per=99.90%, avg=111985.27, stdev=17381.70 
lat (usec) : 500=0.80%, 750=13.10%, 1000=23.33% 
lat (msec) : 2=61.30%, 4=1.46%, 10=0.01%, 20=0.01% 
cpu : usr=7.12%, sys=17.43%, ctx=54507, majf=0, minf=38 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=5120.0MB, aggrb=112094KB/s, minb=112094KB/s, maxb=112094KB/s, mint=46772msec, maxt=46772msec 

Disk stats (read/write): 
vdb: ios=1309169/0, merge=0/0, ticks=1305796/0, in_queue=1305376, util=98.68% 

qemu : non-iothread : jemmaloc : iops=42226 
-------------------------------------------- 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 
fio-2.1.11 
Starting 1 process 
Jobs: 1 (f=1): [r(1)] [100.0% done] [171.2MB/0KB/0KB /s] [43.9K/0/0 iops] [eta 00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=892: Tue Jun 9 18:34:11 2015 
read : io=5120.0MB, bw=177130KB/s, iops=44282, runt= 29599msec 
slat (usec): min=1, max=527, avg= 3.80, stdev= 3.74 
clat (usec): min=174, max=3841, avg=717.08, stdev=237.53 
lat (usec): min=210, max=3844, avg=721.23, stdev=237.22 
clat percentiles (usec): 
| 1.00th=[ 354], 5.00th=[ 422], 10.00th=[ 462], 20.00th=[ 516], 
| 30.00th=[ 572], 40.00th=[ 628], 50.00th=[ 684], 60.00th=[ 740], 
| 70.00th=[ 804], 80.00th=[ 884], 90.00th=[ 1004], 95.00th=[ 1128], 
| 99.00th=[ 1544], 99.50th=[ 1672], 99.90th=[ 1928], 99.95th=[ 2064], 
| 99.99th=[ 2608] 
bw (KB /s): min=138120, max=230816, per=100.00%, avg=177192.14, stdev=23440.79 
lat (usec) : 250=0.01%, 500=16.24%, 750=45.93%, 1000=27.46% 
lat (msec) : 2=10.30%, 4=0.07% 
cpu : usr=10.14%, sys=23.84%, ctx=60938, majf=0, minf=39 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=5120.0MB, aggrb=177130KB/s, minb=177130KB/s, maxb=177130KB/s, mint=29599msec, maxt=29599msec 

Disk stats (read/write): 
vdb: ios=1303992/0, merge=0/0, ticks=798008/0, in_queue=797636, util=99.80% 

----- Mail original ----- 
De: "Robert LeBlanc" < robert@xxxxxxxxxxxxx > 
À: "aderumier" < aderumier@xxxxxxxxx > 
Cc: "Mark Nelson" < mnelson@xxxxxxxxxx >, "ceph-devel" < ceph-devel@xxxxxxxxxxxxxxx >, "pushpesh sharma" < pushpesh.eck@xxxxxxxxx >, "ceph-users" < ceph-users@xxxxxxxxxxxxxx > 
Envoyé: Mardi 9 Juin 2015 18:00:29 
Objet: Re:  rbd_cache, limiting read on high iops around 40k 

-----BEGIN PGP SIGNED MESSAGE----- 
Hash: SHA256 

I also saw a similar performance increase by using alternative memory 
allocators. What I found was that Ceph OSDs performed well with either 
tcmalloc or jemalloc (except when RocksDB was built with jemalloc 
instead of tcmalloc, I'm still working to dig into why that might be 
the case). 

However, I found that tcmalloc with QEMU/KVM was very detrimental to 
small I/O, but provided huge gains in I/O >=1MB. Jemalloc was much 
better for QEMU/KVM in the tests that we ran. [1] 

I'm currently looking into I/O bottlenecks around the 16KB range and 
I'm seeing a lot of time in thread creation and destruction, the 
memory allocators are quite a bit down the list (both fio with 
ioengine rbd and on the OSDs). I wonder what the difference can be. 
I've tried using the async messenger but there wasn't a huge 
difference. [2] 

Further down the rabbit hole.... 

[1] https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg20197.html 
[2] https://www.mail-archive.com/ceph-devel@xxxxxxxxxxxxxxx/msg23982.html 
-----BEGIN PGP SIGNATURE----- 
Version: Mailvelope v0.13.1 
Comment: https://www.mailvelope.com 

wsFcBAEBCAAQBQJVdw2ZCRDmVDuy+mK58QAA4MwP/1vt65cvTyyVGGSGRrE8 
unuWjafMHzl486XH+EaVrDVTXFVFOoncJ6kugSpD7yavtCpZNdhsIaTRZguU 
YpfAppNAJU5biSwNv9QPI7kPP2q2+I7Z8ZkvhcVnkjIythoeNnSjV7zJrw87 
afq46GhPHqEXdjp3rOB4RRPniOMnub5oU6QRnKn3HPW8Dx9ZqTeCofRDnCY2 
S695Dt1gzt0ERUOgrUUkt0FQJdkkV6EURcUschngjtEd5727VTLp02HivVl3 
vDYWxQHPK8oS6Xe8GOW0JjulwiqlYotSlrqSU5FMU5gozbk9zMFPIUW1e+51 
9ART8Ta2ItMhPWtAhRwwvxgy51exCy9kBc+m+ptKW5XRUXOImGcOQxszPGOO 
qIIOG1vVG/GBmo/0i6tliqBFYdXmw1qFV7tFiIbisZRH7Q/1NahjYTHqHhu3 
Dv61T6WrerD+9N6S1Lrz1QYe2Fqa56BHhHSXM82NE86SVxEvUkoGegQU+c7b 
6rY1JvuJHJzva7+M2XHApYCchCs4a1Yyd1qWB7yThJD57RIyX1TOg0+siV13 
R+v6wxhQU0vBovH+5oAWmCZaPNT+F0Uvs3xWAxxaIR9r83wMj9qQeBZTKVzQ 
1aFIi15KqAwOp12yWCmrqKTeXhjwYQNd8viCQCGN7AQyPglmzfbuEHalVjz4 
oSJX 
=k281 
-----END PGP SIGNATURE----- 
---------------- 
Robert LeBlanc 
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 

On Tue, Jun 9, 2015 at 6:02 AM, Alexandre DERUMIER < aderumier@xxxxxxxxx > wrote: 
Frankly, I'm a little impressed that without RBD cache we can hit 80K 
IOPS from 1 VM! 

Note that theses result are not in a vm (fio-rbd on host), so in a vm we'll have overhead. 
(I'm planning to send results in qemu soon) 

How fast are the SSDs in those 3 OSDs? 

Theses results are with datas in buffer memory of osd nodes. 

When reading fulling on ssd (intel s3500), 

For 1 client, 

I'm around 33k iops without cache and 32k iops with cache, with 1 osd. 
I'm around 55k iops without cache and 38k iops with cache, with 3 osd. 

with multiple clients jobs, I can reach around 70kiops by osd , and 250k iops by osd when datas are in buffer. 

(cpus servers/clients are 2x 10 cores 3,1ghz e5 xeon) 

small tip : 
I'm using tcmalloc for fio-rbd or rados bench to improve latencies by around 20% 

LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 fio ... 
LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 rados bench ... 

as a lot of time is spent in malloc/free 

(qemu support also tcmalloc since some months , I'll bench it too 
https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg05372.html ) 

I'll try to send full bench results soon, from 1 to 18 ssd osd. 

----- Mail original ----- 
De: "Mark Nelson" < mnelson@xxxxxxxxxx > 
À: "aderumier" < aderumier@xxxxxxxxx >, "pushpesh sharma" < pushpesh.eck@xxxxxxxxx > 
Cc: "ceph-devel" < ceph-devel@xxxxxxxxxxxxxxx >, "ceph-users" < ceph-users@xxxxxxxxxxxxxx > 
Envoyé: Mardi 9 Juin 2015 13:36:31 
Objet: Re:  rbd_cache, limiting read on high iops around 40k 

Hi All, 

In the past we've hit some performance issues with RBD cache that we've 
fixed, but we've never really tried pushing a single VM beyond 40+K read 
IOPS in testing (or at least I never have). I suspect there's a couple 
of possibilities as to why it might be slower, but perhaps joshd can 
chime in as he's more familiar with what that code looks like. 

Frankly, I'm a little impressed that without RBD cache we can hit 80K 
IOPS from 1 VM! How fast are the SSDs in those 3 OSDs? 

Mark 

On 06/09/2015 03:36 AM, Alexandre DERUMIER wrote: 
It's seem that the limit is mainly going in high queue depth (+- > 16) 

Here the result in iops with 1client- 4krandread- 3osd - with differents queue depth size. 
rbd_cache is almost the same than without cache with queue depth <16 

cache 
----- 
qd1: 1651 
qd2: 3482 
qd4: 7958 
qd8: 17912 
qd16: 36020 
qd32: 42765 
qd64: 46169 

no cache 
-------- 
qd1: 1748 
qd2: 3570 
qd4: 8356 
qd8: 17732 
qd16: 41396 
qd32: 78633 
qd64: 79063 
qd128: 79550 

----- Mail original ----- 
De: "aderumier" < aderumier@xxxxxxxxx > 
À: "pushpesh sharma" < pushpesh.eck@xxxxxxxxx > 
Cc: "ceph-devel" < ceph-devel@xxxxxxxxxxxxxxx >, "ceph-users" < ceph-users@xxxxxxxxxxxxxx > 
Envoyé: Mardi 9 Juin 2015 09:28:21 
Objet: Re:  rbd_cache, limiting read on high iops around 40k 

Hi, 

We tried adding more RBDs to single VM, but no luck. 

If you want to scale with more disks in a single qemu vm, you need to use iothread feature from qemu and assign 1 iothread by disk (works with virtio-blk). 
It's working for me, I can scale with adding more disks. 

My bench here are done with fio-rbd on host. 
I can scale up to 400k iops with 10clients-rbd_cache=off on a single host and around 250kiops 10clients-rbdcache=on. 

I just wonder why I don't have performance decrease around 30k iops with 1osd. 

I'm going to see if this tracker 
http://tracker.ceph.com/issues/11056 

could be the cause. 

(My master build was done some week ago) 

----- Mail original ----- 
De: "pushpesh sharma" < pushpesh.eck@xxxxxxxxx > 
À: "aderumier" < aderumier@xxxxxxxxx > 
Cc: "ceph-devel" < ceph-devel@xxxxxxxxxxxxxxx >, "ceph-users" < ceph-users@xxxxxxxxxxxxxx > 
Envoyé: Mardi 9 Juin 2015 09:21:04 
Objet: Re: rbd_cache, limiting read on high iops around 40k 

Hi Alexandre, 

We have also seen something very similar on Hammer(0.94-1). We were doing some benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno). Each Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. For some strange reason it was not able to scale 4K- RR iops on each VM beyond 35-40k. We tried adding more RBDs to single VM, but no luck. However increasing number of VMs to 4 on a single hypervisor did scale to some extent. After this there was no much benefit we got from adding more VMs. 

Here is the trend we have seen, x-axis is number of hypervisor, each hypervisor has 4 VM, each VM has 1 RBD:- 

VDbench is used as benchmarking tool. We were not saturating network and CPUs at OSD nodes. We were not able to saturate CPUs at hypervisors, and that is where we were suspecting of some throttling effect. However we haven't setted any such limits from nova or kvm end. We tried some CPU pinning and other KVM related tuning as well, but no luck. 

We tried the same experiment on a bare metal. It was 4K RR IOPs were scaling from 40K(1 RBD) to 180K(4 RBDs). But after that rather than scaling beyond that point the numbers were actually degrading. (Single pipe more congestion effect) 

We never suspected that rbd cache enable could be detrimental to performance. It would nice to route cause the problem if that is the case. 

On Tue, Jun 9, 2015 at 11:21 AM, Alexandre DERUMIER < aderumier@xxxxxxxxx > wrote: 

Hi, 

I'm doing benchmark (ceph master branch), with randread 4k qdepth=32, 
and rbd_cache=true seem to limit the iops around 40k 

no cache 
-------- 
1 client - rbd_cache=false - 1osd : 38300 iops 
1 client - rbd_cache=false - 2osd : 69073 iops 
1 client - rbd_cache=false - 3osd : 78292 iops 

cache 
----- 
1 client - rbd_cache=true - 1osd : 38100 iops 
1 client - rbd_cache=true - 2osd : 42457 iops 
1 client - rbd_cache=true - 3osd : 45823 iops 

Is it expected ? 

fio result rbd_cache=false 3 osd 
-------------------------------- 
rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 
fio-2.1.11 
Starting 1 process 
rbd engine: RBD version: 0.1.9 
Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0 iops] [eta 00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun 9 07:48:42 2015 
read : io=10000MB, bw=313169KB/s, iops=78292, runt= 32698msec 
slat (usec): min=5, max=530, avg=11.77, stdev= 6.77 
clat (usec): min=70, max=2240, avg=336.08, stdev=94.82 
lat (usec): min=101, max=2247, avg=347.84, stdev=95.49 
clat percentiles (usec): 
| 1.00th=[ 173], 5.00th=[ 209], 10.00th=[ 231], 20.00th=[ 262], 
| 30.00th=[ 282], 40.00th=[ 302], 50.00th=[ 322], 60.00th=[ 346], 
| 70.00th=[ 370], 80.00th=[ 402], 90.00th=[ 454], 95.00th=[ 506], 
| 99.00th=[ 628], 99.50th=[ 692], 99.90th=[ 860], 99.95th=[ 948], 
| 99.99th=[ 1176] 
bw (KB /s): min=238856, max=360448, per=100.00%, avg=313402.34, stdev=25196.21 
lat (usec) : 100=0.01%, 250=15.94%, 500=78.60%, 750=5.19%, 1000=0.23% 
lat (msec) : 2=0.03%, 4=0.01% 
cpu : usr=74.48%, sys=13.25%, ctx=703225, majf=0, minf=12452 
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=87.0%, 32=12.1%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=91.6%, 8=3.4%, 16=4.5%, 32=0.4%, 64=0.0%, >=64=0.0% 
issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=10000MB, aggrb=313169KB/s, minb=313169KB/s, maxb=313169KB/s, mint=32698msec, maxt=32698msec 

Disk stats (read/write): 
dm-0: ios=0/45, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/24, aggrmerge=0/21, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% 
sda: ios=0/24, merge=0/21, ticks=0/0, in_queue=0, util=0.00% 

fio result rbd_cache=true 3osd 
------------------------------ 

rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32 
fio-2.1.11 
Starting 1 process 
rbd engine: RBD version: 0.1.9 
Jobs: 1 (f=1): [r(1)] [100.0% done] [171.6MB/0KB/0KB /s] [43.1K/0/0 iops] [eta 00m:00s] 
rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113389: Tue Jun 9 07:47:30 2015 
read : io=10000MB, bw=183296KB/s, iops=45823, runt= 55866msec 
slat (usec): min=7, max=805, avg=21.26, stdev=15.84 
clat (usec): min=101, max=4602, avg=478.55, stdev=143.73 
lat (usec): min=123, max=4669, avg=499.80, stdev=146.03 
clat percentiles (usec): 
| 1.00th=[ 227], 5.00th=[ 274], 10.00th=[ 306], 20.00th=[ 350], 
| 30.00th=[ 390], 40.00th=[ 430], 50.00th=[ 470], 60.00th=[ 506], 
| 70.00th=[ 548], 80.00th=[ 596], 90.00th=[ 660], 95.00th=[ 724], 
| 99.00th=[ 844], 99.50th=[ 908], 99.90th=[ 1112], 99.95th=[ 1288], 
| 99.99th=[ 2192] 
bw (KB /s): min=115280, max=204416, per=100.00%, avg=183315.10, stdev=15079.93 
lat (usec) : 250=2.42%, 500=55.61%, 750=38.48%, 1000=3.28% 
lat (msec) : 2=0.19%, 4=0.01%, 10=0.01% 
cpu : usr=60.27%, sys=12.01%, ctx=2995393, majf=0, minf=14100 
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=13.5%, 16=81.0%, 32=5.3%, >=64=0.0% 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
complete : 0=0.0%, 4=95.0%, 8=0.1%, 16=1.0%, 32=4.0%, 64=0.0%, >=64=0.0% 
issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0 
latency : target=0, window=0, percentile=100.00%, depth=32 

Run status group 0 (all jobs): 
READ: io=10000MB, aggrb=183295KB/s, minb=183295KB/s, maxb=183295KB/s, mint=55866msec, maxt=55866msec 

Disk stats (read/write): 
dm-0: ios=0/61, merge=0/0, ticks=0/8, in_queue=8, util=0.01%, aggrios=0/29, aggrmerge=0/32, aggrticks=0/8, aggrin_queue=8, aggrutil=0.01% 
sda: ios=0/29, merge=0/32, ticks=0/8, in_queue=8, util=0.01% 
_______________________________________________ 
ceph-users mailing list 
ceph-users@xxxxxxxxxxxxxx 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
_______________________________________________ 
ceph-users mailing list 
ceph-users@xxxxxxxxxxxxxx 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

-- 
С уважением, Фасихов Ирек Нургаязович 
Моб.: +79229045757 
_______________________________________________ 
ceph-users mailing list 
ceph-users@xxxxxxxxxxxxxx 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

________________________________ 

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). 

-- 
-Pushpesh 

-- 
-Pushpesh 

-- 
-Pushpesh 

-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 

-- 
С уважением, Фасихов Ирек Нургаязович 
Моб.: +79229045757 

-- 
С уважением, Фасихов Ирек Нургаязович 
Моб.: +79229045757 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com