Try it with oflag=dsync instead? I'm curious what kind of variation these disks will provide. Anyway, you're not going to get the same kind of performance with RADOS on 8k sync IO that you will with a local FS. It needs to traverse the network and go through work queues in the daemon; your primary limiter here is probably the per-request latency that you're seeing (average ~30 ms, looking at the rados bench results). The good news is that means you should be able to scale out to a lot of clients, and if you don't force those 8k sync IOs (which RBD won't, unless the application asks for them by itself using directIO or frequent fsync or whatever) your performance will go way up. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Sep 17, 2013 at 1:47 PM, Jason Villalta <jason@xxxxxxxxxxxx> wrote: > > Here are the stats with direct io. > > dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 oflag=direct > 8192000000 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s > > dd if=ddbenchfile of=/dev/null bs=8K > 8192000000 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s > > These numbers are still over all much faster than when using RADOS bench. > The replica is set to 2. The Journals are on the same disk but separate partitions. > > I kept the block size the same 8K. > > > > > On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill <bcampbell@xxxxxxxxxxxxxxxxxxxx> wrote: >> >> As Gregory mentioned, your 'dd' test looks to be reading from the cache (you are writing 8GB in, and then reading that 8GB out, so the reads are all cached reads) so the performance is going to seem good. You can add the 'oflag=direct' to your dd test to try and get a more accurate reading from that. >> >> RADOS performance from what I've seen is largely going to hinge on replica size and journal location. Are your journals on separate disks or on the same disk as the OSD? What is the replica size of your pool? >> >> ________________________________ >> From: "Jason Villalta" <jason@xxxxxxxxxxxx> >> To: "Bill Campbell" <bcampbell@xxxxxxxxxxxxxxxxxxxx> >> Cc: "Gregory Farnum" <greg@xxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx> >> Sent: Tuesday, September 17, 2013 11:31:43 AM >> >> Subject: Re: Ceph performance with 8K blocks. >> >> Thanks for you feed back it is helpful. >> >> I may have been wrong about the default windows block size. What would be the best tests to compare native performance of the SSD disks at 4K blocks vs Ceph performance with 4K blocks? It just seems their is a huge difference in the results. >> >> >> On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill <bcampbell@xxxxxxxxxxxxxxxxxxxx> wrote: >>> >>> Windows default (NTFS) is a 4k block. Are you changing the allocation unit to 8k as a default for your configuration? >>> >>> ________________________________ >>> From: "Gregory Farnum" <greg@xxxxxxxxxxx> >>> To: "Jason Villalta" <jason@xxxxxxxxxxxx> >>> Cc: ceph-users@xxxxxxxxxxxxxx >>> Sent: Tuesday, September 17, 2013 10:40:09 AM >>> Subject: Re: Ceph performance with 8K blocks. >>> >>> >>> Your 8k-block dd test is not nearly the same as your 8k-block rados bench or SQL tests. Both rados bench and SQL require the write to be committed to disk before moving on to the next one; dd is simply writing into the page cache. So you're not going to get 460 or even 273MB/s with sync 8k writes regardless of your settings. >>> >>> However, I think you should be able to tune your OSDs into somewhat better numbers -- that rados bench is giving you ~300IOPs on every OSD (with a small pipeline!), and an SSD-based daemon should be going faster. What kind of logging are you running with and what configs have you set? >>> >>> Hopefully you can get Mark or Sam or somebody who's done some performance tuning to offer some tips as well. :) >>> -Greg >>> >>> On Tuesday, September 17, 2013, Jason Villalta wrote: >>>> >>>> Hello all, >>>> I am new to the list. >>>> >>>> I have a single machines setup for testing Ceph. It has a dual proc 6 cores(12core total) for CPU and 128GB of RAM. I also have 3 Intel 520 240GB SSDs and an OSD setup on each disk with the OSD and Journal in separate partitions formatted with ext4. >>>> >>>> My goal here is to prove just how fast Ceph can go and what kind of performance to expect when using it as a back-end storage for virtual machines mostly windows. I would also like to try to understand how it will scale IO by removing one disk of the three and doing the benchmark tests. But that is secondary. So far here are my results. I am aware this is all sequential, I just want to know how fast it can go. >>>> >>>> DD IO test of SSD disks: I am testing 8K blocks since that is the default block size of windows. >>>> dd of=ddbenchfile if=/dev/zero bs=8K count=1000000 >>>> 8192000000 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s >>>> >>>> dd if=ddbenchfile of=/dev/null bs=8K >>>> 8192000000 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s >>>> >>>> RADOS bench test with 3 SSD disks and 4MB object size(Default): >>>> rados --no-cleanup bench -p pbench 30 write >>>> Total writes made: 2061 >>>> Write size: 4194304 >>>> Bandwidth (MB/sec): 273.004 >>>> >>>> Stddev Bandwidth: 67.5237 >>>> Max bandwidth (MB/sec): 352 >>>> Min bandwidth (MB/sec): 0 >>>> Average Latency: 0.234199 >>>> Stddev Latency: 0.130874 >>>> Max latency: 0.867119 >>>> Min latency: 0.039318 >>>> ----- >>>> rados bench -p pbench 30 seq >>>> Total reads made: 2061 >>>> Read size: 4194304 >>>> Bandwidth (MB/sec): 956.466 >>>> >>>> Average Latency: 0.0666347 >>>> Max latency: 0.208986 >>>> Min latency: 0.011625 >>>> >>>> This all looks like I would expect from using three disks. The problems appear to come with the 8K blocks/object size. >>>> >>>> RADOS bench test with 3 SSD disks and 8K object size(8K blocks): >>>> rados --no-cleanup bench -b 8192 -p pbench 30 write >>>> Total writes made: 13770 >>>> Write size: 8192 >>>> Bandwidth (MB/sec): 3.581 >>>> >>>> Stddev Bandwidth: 1.04405 >>>> Max bandwidth (MB/sec): 6.19531 >>>> Min bandwidth (MB/sec): 0 >>>> Average Latency: 0.0348977 >>>> Stddev Latency: 0.0349212 >>>> Max latency: 0.326429 >>>> Min latency: 0.0019 >>>> ------ >>>> rados bench -b 8192 -p pbench 30 seq >>>> Total reads made: 13770 >>>> Read size: 8192 >>>> Bandwidth (MB/sec): 52.573 >>>> >>>> Average Latency: 0.00237483 >>>> Max latency: 0.006783 >>>> Min latency: 0.000521 >>>> >>>> So are these performance correct or is this something I missed with the testing procedure? The RADOS bench number with 8K block size are the same we see when testing performance in an VM with SQLIO. Does anyone know of any configure changes that are needed to get the Ceph performance closer to native performance with 8K blocks? >>>> >>>> Thanks in advance. >>>> >>>> >>>> >>>> -- >>>> -- >>>> Jason Villalta >>>> Co-founder >>>> 800.799.4407x1230 | www.RubixTechnology.com >>> >>> >>> >>> -- >>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. >>> >> >> >> >> -- >> -- >> Jason Villalta >> Co-founder >> 800.799.4407x1230 | www.RubixTechnology.com >> >> >> NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. >> > > > > -- > -- > Jason Villalta > Co-founder > 800.799.4407x1230 | www.RubixTechnology.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com