Re: Low RBD Performance

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Mon, 03 Feb 2014 20:48:12 -0600

On 02/03/2014 07:29 PM, Gruher, Joseph R wrote:
Hi folks-

I’m having trouble demonstrating reasonable performance of RBDs.  I’m
running Ceph 0.72.2 on Ubuntu 13.04 with the 3.12 kernel.  I have four
dual-Xeon servers, each with 24GB RAM, and an Intel 320 SSD for journals
and four WD 10K RPM SAS drives for OSDs, all connected with an LSI
1078.  This is just a lab experiment using scrounged hardware so
everything isn’t sized to be a Ceph cluster, it’s just what I have lying
around, but I should have more than enough CPU and memory resources.
Everything is connected with a single 10GbE.

When testing with RBDs from four clients (also running Ubuntu 13.04 with
3.12 kernel) I am having trouble breaking 300 IOPS on a 4KB random read
or write workload (cephx set to none, replication set to one).  IO is
generated using FIO from four clients, each hosting a single 1TB RBD,
and I’ve experimented with queue depths and increasing the number of
RBDs without any benefit.  300 IOPS for a pool of 16 10K RPM HDDs seems
quite low, not to mention the journal should provide a good boost on
write workloads.  When I run a 4KB object write workload in Cosbench I
can approach 3500 Obj/Sec which seems more reasonable.

Sample FIO configuration:

[global]

ioengine=libaio

direct=1

ramp_time=300

runtime=300

[4k-rw]

description=4k-rw

filename=/dev/rbd1

rw=randwrite

bs=4k

stonewall

I use --iodepth=X on the FIO command line to set the queue depth when
testing.

I notice in the FIO output despite the iodepth setting it seems to be
reporting an IO depth of only 1, which would certainly help explain poor
performance, but I’m at a loss as to why, I wonder if it could be
something specific to RBD behavior, like I need to use a different IO
engine to establish queue depth.

IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%

Any thoughts appreciated!

Interesting results with the io depth at 1.  I Haven't seen that 
behaviour when using libaio, direct=1, and higher io depths.  Is this 
kernel RBD or QEMU/KVM?  If it's QEMU/KVM, is it the libvirt driver?

Certainly 300 IOPS is low for that kind of setup compared to what we've 
seen for RBD on other systems (especially with 1x replication).  Given 
that you are seeing more reasonable performance with RGW, I guess I'd 
look at a couple things:

- Figure out why fio is reporting queue depth = 1
- Does increasing the num jobs help (ie get concurrency another way)?
- Do you have enough PGs in the RBD pool?
- Are you using the virtio driver if QEMU/KVM?

Thanks,

Joe

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com