Re: poor write performance

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Mon, 22 Apr 2013 06:39:02 -0500

On 04/22/2013 06:34 AM, James Harper wrote:
Hi,

Correct, but that's the theoretical maximum I was referring to. If I calculate
that I should be able to get 50MB/second then 30MB/second is acceptable
but 500KB/second is not :)

I have written a small benchmark for RBD :

https://gist.github.com/smunaut/5433222

It uses the librbd API directly without kernel client and queue
requests long in advance and this should give an "upper" bound to what
you can get at best.
It reads and writes the whole image, so I usually just create a 1 or 2
G image for testing.

Using two OSDs on two distinct recent 7200rpm drives (with journal on
the same disk as data), I get :

Read: 89.52 Mb/s (2147483648 bytes in 22877 ms)
Write: 10.62 Mb/s (2147483648 bytes in 192874 ms)

I like your benchmark tool!

How many replicas? With two OSD's with xfs on ~3yo 1TB disks with two replicas I get:

# ./a.out admin xen test
Read: 111.99 Mb/s (1073741824 bytes in 9144 ms)
Write: 29.68 Mb/s (1073741824 bytes in 34507 ms)

Which means I forgot to drop caches on the OSD's so I'm seeing the limit on my public network (single gigabit interface). After dropping caches I consistently get:

# ./a.out admin xen test
Read: 39.98 Mb/s (1073741824 bytes in 25614 ms)
Write: 23.11 Mb/s (1073741824 bytes in 44316 ms)

Journal is on the same disk. Network is... confusing :) but is basically public on a single gigabit and cluster on a bonded pair of gigabit links. The whole network thing is shared with my existing drbd cluster so performance may vary over time.

My read speed is consistently around 40MB/second, and my write speed is consistently around 22MB/second. I had expected better of read...

You may want to try increasing your read_ahead_kb on the OSD data disks 
and see if that helps read speeds.

While running, iostat on each osd reports a read rate of around 20MB/second (1/2 total on each) during read test and a rate of 40-60MB/second (~2x total on each) during write test, which is pretty much exactly right.

iperf on the cluster network (pair of gigabits bonded) gives me about 1.97Gbits/second. iperf between osd and client is around 0.94Gbits/second.

changing the scheduler on the harddisk doesn't seem to make any difference, even when I set it to cfq which normally really sucks.

What ceph version are you using and what filesystem?

Thanks

James

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html