Re: under performing osd, where to look ?

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Mon, 01 Apr 2013 19:35:29 -0500

On 03/31/2013 06:37 PM, Matthieu Patou wrote:
Hi,

I was doing some testing with iozone and found that performance of an
exported rdb volume where 1/3 of the performance of the hard drives.
I was expecting to have a performance penalty but not so important.
I suspect something is not correct in the configuration but I can't what
exactly.

A couple of things:

1) Were you testing writes?  If so, were your journals on the same disks 
as the OSDs?  Ceph writes a full copy of the data to the journal 
currently which means you are doing 2 writes for every 1.  This is why 
some people use fast SSDs for 3-4 journals each.  On the other hand, 
that means that losing a journal causes more data replication and you 
may lose read performance and capacity if the SSD is taking up a spot 
that an OSD would have otherwise occupied.

2) Are you factoring in any replication being done?

3) Do you have enough concurrent operations to hide latency?  You may 
want to play with different io depth values in iozone and see if that 
helps.  you may also want to try using multiple clients at the same time.

4) If you are using QEMU/KVM, is rbd cache enabled?  Also, are you using 
the virtio driver?  It's significantly faster than the ide driver.

5) There's always going to be a certain amount of overhead.  I have a 
node with 24 drives and all journals on SSDs.  Theoretically the drives 
can do about 3.4GB/s aggregate just doing fio directly to the disks. 
With EXT4/BTRFS I can do anywhere from 2.2-2.6GB/s with RADOS bench 
writing objects our directly to ceph.  With XFS, that drops to ~1.9GB/s 
max.  Now throw QEMU/KVM RBD on and performance drops to about 1.5GB/s. 
 Each layer of software introduces a little more overhead.  Having said 
that, 1.5GB/s out of a ~$10-12k node using a system that can grow on 
demand and replicate the way ceph can is fantastic imho, and every major 
release so far is seeing big performance improvements.

Mark

My test setup consists of 2 OSD with one drive each they have 2 network
(public/private).
On osd0 and osd1 I'm using a BCM5751 gigabit card for the public network
and a intel 82541PI for the private.
On the client I'm using a RTL8111/8168B for the public network.

Osd0 & Osd1 are connect directly with a cable for the private network
and they are connected through one switch for the public network.

Where could the bottleneck be located ?

Matthieu.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com