RBD vs. raw RADOS performance

Christian Kauhaus <kc@xxxxxxxxxx> · Fri, 24 Jan 2014 16:31:05 +0100

Hi,

we're using Ceph to serve VM images via RBD and thus, RBD performance is
important for us. I've prepared some write benchmarks using different object
sizes. One time I use 'rados bench' directly and the other time 'rbd bench-write'.

The results are interesting: Raw RADOS write rates are significantly better
for large objects (>128k), RBD performs better for medium sized objects (>16k,
<128k), but RBD is really slow for small writes. We have lot of small writes,
so this is the pain point. I think latencies are dominant here.

Our test setup consists of two Ceph servers running a MON and 9 OSDs (one OSD
daemon per disk; ext4 filesystem) with journals on a shared SSD (one SSD
partition per OSD). There are 2 GigE networks (storage frontend/backend) with
approx 62µs RTT and jumbo frames enabled. See attached ceph.conf further
details. Some parameters there are taken from the tuning recommendations at
[1]. Note that I have to stick to ext4 on the OSDs.

Is there anything we can do to improve latencies? I don't know where to start:

* OSD setup?
* Network setup?
* ceph.conf parameter tuning?
* Separate MONs?
* Separate networks for MON access?

A lot of options... so I would be grateful for hints what is worth looking at.

Please refer to bitbucket[2] for benchmark scripts.

TIA

Christian

[1] http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/
[2] https://bitbucket.org/ckauhaus/ceph_performance
-- 
Dipl.-Inf. Christian Kauhaus <>< · kc@xxxxxxxxxx · systems administration
gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
http://gocept.com · tel +49 345 219401-11
Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
Attachment:
rados_vs_rbd_write_performance.png

Description: PNG image
[global]
fsid = b67bad36-3273-11e3-a2ed-0200000311bf
public network = 172.20.4.0/24
cluster network = 172.20.8.0/24
osd pool default min size = 1
osd pool default size = 2
osd pool default pg num = 25
osd pool default pgp num = 25
mon host = cartman02.sto.dev.gocept.net,kyle02.sto.dev.gocept.net,patty.sto.dev.gocept.net
ms dispatch throttle bytes = 335544320

[client]
log file = /var/log/ceph/client.log
rbd cache = true
rbd default format = 2

[mon]
mon host = cartman02,kyle02,patty
mon addr = 172.20.4.6:6789,172.20.4.9:6789,172.20.4.10:6789
mon data = /srv/ceph/mon/$cluster-$id

[mon.cartman02]
host = cartman02
mon addr = 172.20.4.6:6789
public addr = 172.20.4.6:6789
cluster addr = 172.20.8.4:6789

[mon.kyle02]
# ...

[osd]
public addr = 172.20.4.6
cluster addr = 172.20.8.4
filestore fiemap = true
filestore op threads = 1
filestore queue committing max bytes = 167772160
filestore queue max bytes = 167772160
filestore xattr use omap = true
journal max write bytes = 167772160
journal queue max bytes = 167772160
osd deep scrub interval = 2592000
osd journal size = 0
osd op threads = 4

[osd.0]
host = cartman02
osd uuid = c4b6d576-86d3-5e9a-9661-36b1fa36f4cf
osd data = /srv/ceph/osd/ceph-0
osd journal = /dev/vgjnl00/ceph-jnl00
filestore max sync interval = 102

[osd.1]
# ...
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com