Re: rbd performance issue - can't find bottleneck

Jacek Jarosiewicz <jjarosiewicz@xxxxxxxxxxxxx> · Wed, 17 Jun 2015 16:03:17 +0200

On 06/17/2015 03:34 PM, Mark Nelson wrote:
On 06/17/2015 04:10 AM, Jacek Jarosiewicz wrote:
Hi,

[ cut ]

~60MB/s seq writes
~100MB/s seq reads
~2-3k iops random reads

Is this per SSD or aggregate?

aggregate (if I understand You correctly). This is what I see when I run 
tests on client - a mapped and mounted rbd.

The client is an rbd mounted on a linux ubuntu box. All the servers (osd
nodes and the client) are running Ubuntu Server 14.04. We tried to
switch to CentOS 7 - but the results are the same.

Is this kernel RBD or a VM using QEMU/KVM?  You might want to try fio
with the librbd engine and see if you get the same results.  Also,
radosbench isn't exactly analogous, but you might try some large
sequential write / sequential read tests just as a sanity check.

This is kernel rbd - testing performance on vm's will be the next step.
I've tried fio with librbd, but the results were similar.
I'll run ther radosbench tests and post my results.

Here are some technical details about our setup:

Four exact same osd nodes:
E5-1630 CPU
32 GB RAM
Mellanox MT27520 56Gbps network cards
SATA controller LSI Logic SAS3008

Specs look fine.

Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD

Is that where the SSDs live?  I'm not a fan of such heavy expander
over-subscription, but if you are getting good results outside of Ceph
I'm guessing it's something else.

No, the SSD's are connected to the integrated intel sata controller 
(C610/X99)

The only disks that reside in the SuperMicro chasis are the SATA drives. 
And on the last tests I don't use them - the results I gave are on SSD's 
only (one SSD serves as OSD and the journal is on another SSD).

Four monitors (one on each node). We do not use CephFS so we do not run
ceph-mds.

You'll want to go down to 3 or up to 5.  Even numbers of monitors don't
really help you in any way (and can actually hurt).  I'd suggest 3.

OK, will do that, thanks!

You didn't mention the brand/model of SSDs.  Especially for writes this
is important as ceph journal writes are O_DSYNC.  Drives that have
proper write loss protection often can ignore ATA_CMD_FLUSH and do these
very quickly while other drives may need to flush to the flash cells.
Also, keep in mind for writes that if you have journals on the SSDs and
3X replication, you'll be doing 6 writes for every client write.

SSD's are INTEL SSDSC2BW240A4
The rbd pool is set to have min_size 1 and size 2.

For reads and read IOPs on SSDs, you might try disabling in-memory
logging and ceph authentication.  You might be interested in some
testing we did on a variety of SSDs here:

http://www.spinics.net/lists/ceph-users/msg15733.html

Will read up on that too, thanks!

J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych

----------------------------------------------------------------------------------------
SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,
nr KRS 0000029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa

----------------------------------------------------------------------------------------
SUPERMEDIA ->   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com