Re: Ceph performance with 8K blocks.

Jason Villalta <jason@xxxxxxxxxxxx> · Fri, 20 Sep 2013 20:17:37 -0400

Thanks Jamie,I have not tried bonnie++.  I was trying to keep it to sequential IO for comparison since that is all Rados bench can do.   I did do a full io test in a windows vm using SQLIO.  I have both read/write sequential/random for 4/8/64K blocks from that test.  I also have access to a Dell Equallogic so I was using that as high end benchmark with the same SQLIO tests.  Same goes for a single Intel SSD 320 my partner has.  I can attach that if you want to look at it. 
In those test the random IO was not too bad between the Equallogic and Ceph but the sequential was not(possibly because of the huge cache in the Equallogic.  But I am still bother by the zero difference in performance between using 1 SSD disk vs 2 vs 3 disks.  I would think there would be an increase in reads atleast.

On Fri, Sep 20, 2013 at 7:44 PM, Jamie Alquiza <ja@xxxxxxxxxxxxxxxxx> wrote:

The iflag addition should help with at least having more accurate reads via dd, but in terms of actually testing performance, have you tried sysbench or bonie++?

I'd be curious how things change with multiple io threads, as dd isn't necessarily a good performance investigation tool (you're rather testing "dd performance" as opposed to "using dd to test performance") if the concern is what to expect for your multi-tenant vm block store. 

Personally, I get more bugged out over many-thread random read throughput or synchronous write latency.

On Friday, September 20, 2013, Jason Villalta  wrote:

Thanks Jamie,
I tried that too.  But similar results.  The issue looks to possibly be with the latency but everything is running on one server so logiclly I would think there would be no latency but according to this there may be something that is causing slow results.  See Co-Residency

http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/

I have not found a way to prove this to be true other than testing many difference configurations of OSDs and drives.  At one point I had 3 OSDs all running one SSD drive.  The performance was the same as when three OSDs were running on 3 separate SSD drives.  Seems like there is something else going on here.  

Also I ran iotop while running rados bench and virtual machine sqlio.  Write max out at 200-300MBps for the duration of the test.  Reads never hit a sustained rate anywhere near that speed.

On Fri, Sep 20, 2013 at 7:18 PM, Jamie Alquiza <ja@xxxxxxxxxxxxxxxxx> wrote:

I thought I'd just throw this in there, as I've been following this thread: dd also has an 'iflag' directive just like the 'oflag'. 

I don't have a deep, offhand recollection of the caching mechanisms at play here, but assuming you want a solid synchronous / non-cached read, you should probably specify 'iflag=direct'.

On Friday, September 20, 2013, Jason Villalta  wrote:
Mike,So I do have to ask, where would the extra latency be coming from if all my OSDs are on the same machine that my test VM is running on?  I have tried every SSD tweak in the book.  The primary concerning issue I see is with Read performance of sequential IOs in the 4-8K range.  I would expect those to pull from three SSD disks on a local machine atleast as fast one Native SDD test.  But I don't see that, its actually slower.

On Wed, Sep 18, 2013 at 4:02 PM, Jason Villalta <jason@xxxxxxxxxxxx> wrote:

Thank Mike,High hopes right ;)

I guess we are not doing too bad compared to you numbers then.  Just wish the gap was a little closer between native and ceph per osd.

C:\Program Files (x86)\SQLIO>sqlio -kW -t8 -s30 -o8 -fsequential -b1024 -BH -LS
c:\TestFile.dat
sqlio v1.5.SG
using system counter for latency timings, 100000000 counts per second

8 threads writing for 30 secs to file c:\TestFile.dat
        using 1024KB sequential IOs
        enabling multiple I/Os per thread with 8 outstanding
        buffering set to use hardware disk cache (but not file cache)

using current size: 10240 MB for file: c:\TestFile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:   180.20
MBs/sec:   180.20
latency metrics:

Min_Latency(ms): 39
Avg_Latency(ms): 352
Max_Latency(ms): 692
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%:  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 100

On Wed, Sep 18, 2013 at 3:55 PM, Mike Lowe <j.michael.lowe@xxxxxxxxx> wrote:

Well, in a word, yes. You really expect a network replicated storage system in user space to be comparable to direct attached ssd storage?  For what it's worth, I've got a pile of regular spinning rust, this is what my cluster will do inside a vm with rbd writeback caching on.  As you can see, latency is everything.

dd if=/dev/zero of=1g bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 6.26289 s, 171 MB/s
dd if=/dev/zero of=1g bs=1M count=1024 oflag=dsync

1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 37.4144 s, 28.7 MB/s

As you can see, latency is a killer.

On Sep 18, 2013, at 3:23 PM, Jason Villalta <jason@xxxxxxxxxxxx> wrote:

Any other thoughts on this thread guys.  I am just crazy to want near native SSD performance on a small SSD cluster?

On Wed, Sep 18, 2013 at 8:21 AM, Jason Villalta <jason@xxxxxxxxxxxx> wrote:

That dd give me this.
dd if=ddbenchfile of=- bs=8K | dd if=- of=/dev/null bs=8K

8192000000 bytes (8.2 GB) copied, 31.1807 s, 263 MB/s 

Which makes sense because the SSD is running as SATA 2 which should give 3Gbps or ~300MBps

I am still trying to better understand the speed difference between the small block speeds seen with dd vs the same small object size with rados.  It is not a difference

-- 

-- 

Jason Villalta
Co-founder
800.799.4407x1230 | www.RubixTechnology.com

-- 
-ja. Sent via mobile.

-- 
-- 

Jason Villalta
Co-founder
800.799.4407x1230 | www.RubixTechnology.com

Attachment:
EmailLogo.png

Description: PNG image
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com