Re: Ceph performance with 8K blocks.

Jason Villalta <jason@xxxxxxxxxxxx> · Fri, 20 Sep 2013 19:27:14 -0400

Thanks Jamie,
I tried that too.  But similar results.  The issue looks to possibly be with the latency but everything is running on one server so logiclly I would think there would be no latency but according to this there may be something that is causing slow results.  See Co-Residency

http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/

I have not found a way to prove this to be true other than testing many difference configurations of OSDs and drives.  At one point I had 3 OSDs all running one SSD drive.  The performance was the same as when three OSDs were running on 3 separate SSD drives.  Seems like there is something else going on here.  

Also I ran iotop while running rados bench and virtual machine sqlio.  Write max out at 200-300MBps for the duration of the test.  Reads never hit a sustained rate anywhere near that speed.

On Fri, Sep 20, 2013 at 7:18 PM, Jamie Alquiza <ja@xxxxxxxxxxxxxxxxx> wrote:

I thought I'd just throw this in there, as I've been following this thread: dd also has an 'iflag' directive just like the 'oflag'. 

I don't have a deep, offhand recollection of the caching mechanisms at play here, but assuming you want a solid synchronous / non-cached read, you should probably specify 'iflag=direct'.

On Friday, September 20, 2013, Jason Villalta  wrote:
Mike,So I do have to ask, where would the extra latency be coming from if all my OSDs are on the same machine that my test VM is running on?  I have tried every SSD tweak in the book.  The primary concerning issue I see is with Read performance of sequential IOs in the 4-8K range.  I would expect those to pull from three SSD disks on a local machine atleast as fast one Native SDD test.  But I don't see that, its actually slower.

On Wed, Sep 18, 2013 at 4:02 PM, Jason Villalta <jason@xxxxxxxxxxxx> wrote:

Thank Mike,High hopes right ;)

I guess we are not doing too bad compared to you numbers then.  Just wish the gap was a little closer between native and ceph per osd.

C:\Program Files (x86)\SQLIO>sqlio -kW -t8 -s30 -o8 -fsequential -b1024 -BH -LS
c:\TestFile.dat
sqlio v1.5.SG
using system counter for latency timings, 100000000 counts per second

8 threads writing for 30 secs to file c:\TestFile.dat
        using 1024KB sequential IOs
        enabling multiple I/Os per thread with 8 outstanding
        buffering set to use hardware disk cache (but not file cache)

using current size: 10240 MB for file: c:\TestFile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:   180.20
MBs/sec:   180.20
latency metrics:

Min_Latency(ms): 39
Avg_Latency(ms): 352
Max_Latency(ms): 692
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 100

On Wed, Sep 18, 2013 at 3:55 PM, Mike Lowe <j.michael.lowe@xxxxxxxxx> wrote:

Well, in a word, yes. You really expect a network replicated storage system in user space to be comparable to direct attached ssd storage?  For what it's worth, I've got a pile of regular spinning rust, this is what my cluster will do inside a vm with rbd writeback caching on.  As you can see, latency is everything.

dd if=/dev/zero of=1g bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 6.26289 s, 171 MB/s
dd if=/dev/zero of=1g bs=1M count=1024 oflag=dsync

1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 37.4144 s, 28.7 MB/s

As you can see, latency is a killer.

On Sep 18, 2013, at 3:23 PM, Jason Villalta <jason@xxxxxxxxxxxx> wrote:

Any other thoughts on this thread guys.  I am just crazy to want near native SSD performance on a small SSD cluster?

On Wed, Sep 18, 2013 at 8:21 AM, Jason Villalta <jason@xxxxxxxxxxxx> wrote:

That dd give me this.
dd if=ddbenchfile of=- bs=8K | dd if=- of=/dev/null bs=8K

8192000000 bytes (8.2 GB) copied, 31.1807 s, 263 MB/s 

Which makes sense because the SSD is running as SATA 2 which should give 3Gbps or ~300MBps

I am still trying to better understand the speed difference between the small block speeds seen with dd vs the same small object size with rados.  It is not a difference of a few MB per sec.  It seems to nearly be a factor of 10.  I just want to know if this is a hard limit in Ceph or a factor of the underlying disk speed.  Meaning if I use spindles to read data would the speed be the same or would the read speed be a factor of 10 less than the speed of the underlying disk?

On Wed, Sep 18, 2013 at 4:27 AM, Alex Bligh <alex@xxxxxxxxxxx> wrote:

On 17 Sep 2013, at 21:47, Jason Villalta wrote:

> dd if=ddbenchfile of=/dev/null bs=8K

> 

-- 
-ja. Sent via mobile.

-- 
-- 

Jason Villalta
Co-founder
800.799.4407x1230 | www.RubixTechnology.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com