Re: Ceph performance with 8K blocks.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also enabling rbd writeback caching will allow requests to be merged,
which will help a lot for small sequential I/O.

On 09/17/2013 02:03 PM, Gregory Farnum wrote:
Try it with oflag=dsync instead? I'm curious what kind of variation
these disks will provide.

Anyway, you're not going to get the same kind of performance with
RADOS on 8k sync IO that you will with a local FS. It needs to
traverse the network and go through work queues in the daemon; your
primary limiter here is probably the per-request latency that you're
seeing (average ~30 ms, looking at the rados bench results). The good
news is that means you should be able to scale out to a lot of
clients, and if you don't force those 8k sync IOs (which RBD won't,
unless the application asks for them by itself using directIO or
frequent fsync or whatever) your performance will go way up.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Sep 17, 2013 at 1:47 PM, Jason Villalta <jason@xxxxxxxxxxxx> wrote:

Here are the stats with direct io.

dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 oflag=direct
8192000000 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s

dd if=ddbenchfile of=/dev/null bs=8K
8192000000 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s

These numbers are still over all much faster than when using RADOS bench.
The replica is set to 2.  The Journals are on the same disk but separate partitions.

I kept the block size the same 8K.




On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill <bcampbell@xxxxxxxxxxxxxxxxxxxx> wrote:

As Gregory mentioned, your 'dd' test looks to be reading from the cache (you are writing 8GB in, and then reading that 8GB out, so the reads are all cached reads) so the performance is going to seem good.  You can add the 'oflag=direct' to your dd test to try and get a more accurate reading from that.

RADOS performance from what I've seen is largely going to hinge on replica size and journal location.  Are your journals on separate disks or on the same disk as the OSD?  What is the replica size of your pool?

________________________________
From: "Jason Villalta" <jason@xxxxxxxxxxxx>
To: "Bill Campbell" <bcampbell@xxxxxxxxxxxxxxxxxxxx>
Cc: "Gregory Farnum" <greg@xxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Tuesday, September 17, 2013 11:31:43 AM

Subject: Re:  Ceph performance with 8K blocks.

Thanks for you feed back it is helpful.

I may have been wrong about the default windows block size.  What would be the best tests to compare native performance of the SSD disks at 4K blocks vs Ceph performance with 4K blocks?  It just seems their is a huge difference in the results.


On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill <bcampbell@xxxxxxxxxxxxxxxxxxxx> wrote:

Windows default (NTFS) is a 4k block.  Are you changing the allocation unit to 8k as a default for your configuration?

________________________________
From: "Gregory Farnum" <greg@xxxxxxxxxxx>
To: "Jason Villalta" <jason@xxxxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Sent: Tuesday, September 17, 2013 10:40:09 AM
Subject: Re:  Ceph performance with 8K blocks.


Your 8k-block dd test is not nearly the same as your 8k-block rados bench or SQL tests. Both rados bench and SQL require the write to be committed to disk before moving on to the next one; dd is simply writing into the page cache. So you're not going to get 460 or even 273MB/s with sync 8k writes regardless of your settings.

However, I think you should be able to tune your OSDs into somewhat better numbers -- that rados bench is giving you ~300IOPs on every OSD (with a small pipeline!), and an SSD-based daemon should be going faster. What kind of logging are you running with and what configs have you set?

Hopefully you can get Mark or Sam or somebody who's done some performance tuning to offer some tips as well. :)
-Greg

On Tuesday, September 17, 2013, Jason Villalta wrote:

Hello all,
I am new to the list.

I have a single machines setup for testing Ceph.  It has a dual proc 6 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520 240GB SSDs and an OSD setup on each disk with the OSD and Journal in separate partitions formatted with ext4.

My goal here is to prove just how fast Ceph can go and what kind of performance to expect when using it as a back-end storage for virtual machines mostly windows.  I would also like to try to understand how it will scale IO by removing one disk of the three and doing the benchmark tests.  But that is secondary.  So far here are my results.  I am aware this is all sequential, I just want to know how fast it can go.

DD IO test of SSD disks:  I am testing 8K blocks since that is the default block size of windows.
  dd of=ddbenchfile if=/dev/zero bs=8K count=1000000
8192000000 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s

dd if=ddbenchfile of=/dev/null bs=8K
8192000000 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s

RADOS bench test with 3 SSD disks and 4MB object size(Default):
rados --no-cleanup bench -p pbench 30 write
Total writes made:      2061
Write size:             4194304
Bandwidth (MB/sec):     273.004

Stddev Bandwidth:       67.5237
Max bandwidth (MB/sec): 352
Min bandwidth (MB/sec): 0
Average Latency:        0.234199
Stddev Latency:         0.130874
Max latency:            0.867119
Min latency:            0.039318
-----
rados bench -p pbench 30 seq
Total reads made:     2061
Read size:            4194304
Bandwidth (MB/sec):    956.466

Average Latency:       0.0666347
Max latency:           0.208986
Min latency:           0.011625

This all looks like I would expect from using three disks.  The problems appear to come with the 8K blocks/object size.

RADOS bench test with 3 SSD disks and 8K object size(8K blocks):
rados --no-cleanup bench -b 8192 -p pbench 30 write
Total writes made:      13770
Write size:             8192
Bandwidth (MB/sec):     3.581

Stddev Bandwidth:       1.04405
Max bandwidth (MB/sec): 6.19531
Min bandwidth (MB/sec): 0
Average Latency:        0.0348977
Stddev Latency:         0.0349212
Max latency:            0.326429
Min latency:            0.0019
------
rados bench -b 8192 -p pbench 30 seq
Total reads made:     13770
Read size:            8192
Bandwidth (MB/sec):    52.573

Average Latency:       0.00237483
Max latency:           0.006783
Min latency:           0.000521

So are these performance correct or is this something I missed with the testing procedure?  The RADOS bench number with 8K block size are the same we see when testing performance in an VM with SQLIO.  Does anyone know of any configure changes that are needed to get the Ceph performance closer to native performance with 8K blocks?

Thanks in advance.



--
--
Jason Villalta
Co-founder
800.799.4407x1230 | www.RubixTechnology.com



--
Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies.




--
--
Jason Villalta
Co-founder
800.799.4407x1230 | www.RubixTechnology.com


NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies.




--
--
Jason Villalta
Co-founder
800.799.4407x1230 | www.RubixTechnology.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux