Re: Understanding write performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christian,
Thank you for the follow-up on this. 
 
I answered those questions inline below.
 
Have a good day,
 
Lewis George
 

From: "Christian Balzer" <chibi@xxxxxxx>
Sent: Thursday, August 18, 2016 6:31 PM
To: ceph-users@xxxxxxxxxxxxxx
Cc: "lewis.george@xxxxxxxxxxxxx" <lewis.george@xxxxxxxxxxxxx>
Subject: Re: Understanding write performance
 

Hello,

On Thu, 18 Aug 2016 12:03:36 -0700 lewis.george@xxxxxxxxxxxxx wrote:

>> Hi,
>> So, I have really been trying to find information about this without
>> annoying the list, but I just can't seem to get any clear picture of it. I
>> was going to try to search the mailing list archive, but it seems there is
>> an error when trying to search it right now(posting below, and sending to
>> listed address in error).
>>
>Google (as in all the various archives of this ML) works well for me,
>as always the results depend on picking "good" search strings.
>
>> I have been working for a couple of months now(slowly) on testing out
>> Ceph. I only have a small PoC setup. I have 6 hosts, but I am only using 3
>> of them in the cluster at the moment. They each have 6xSSDs(only 5 usable
>> by Ceph), but the networks(1 public, 1 cluster) are only 1Gbps. I have the
>> MONs running on the same 3 hosts, and I have an OSD process running for
>> each of the 5 disks per host. The cluster shows in good health, with 15
>> OSDs. I have one pool there, the default rbd, which I setup with 512 PGs.
>>
>Exact SSD models, please.
>Also CPU, though at 1GbE that isn't going to be your problem.
 
#Lewis: Each SSD is of model:
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 840 PRO Series
 
Each of the 3 nodes has 2 x Intel E5645, with 48GB of memory.

>> I have create an rbd image on the pool, and I have it mapped and mounted
>> on another client host.
>Mapped via the kernel interface?
 
# Lewis On the client node(which is same specs as the 3 others), I used the 'rbd map' command to map a 100GB rbd image to rbd0, then created an xfs FS on there, and mounted it.

>>When doing write tests, like with 'dd', I am
>> getting rather spotty performance.
>Example dd command line please.
 
#Lewis: I put those below.

>>Not only is it up and down, but even
>> when it is up, the performance isn't that great. On large'ish(4GB
>> sequential) writes, it averages about 65MB/s, and on repeated smaller(40MB)
>> sequential writes, it is jumping around between 20MB/s and 80MB/s.
>>
>Monitor your storage nodes during these test runs with atop (or iostat)
>and see how busy your actual SSDs are then.
>Also test with "rados bench" to get a base line.
 
#Lewis: I have all the nodes instrumented with collectd. I am seeing each disk only writing at ~25MB/s during the write tests. I will check out the 'rados bench' command, as I have not checked it yet.

>> However, with read tests, I am able to completely max out the network
>> there, easily reaching 125MB/s. Tests on the disks directly are able to get
>> up to 550MB/s reads and 350MB/s writes. So, I know it isn't a problem with
>> the disks.
>>
>How did you test these speed, exact command line please.
>There are SSDs that can write very fast with buffered I/O but are
>abysmally slow with sync/direct I/O.
>Which is what Ceph journals use.
 
#Lewis: I have mostly been testing with just dd, though I have also tested using several fio tests too. With dd, I have tested writing 4GB files, with both 4k and 1M block sizes(get about the same results, on average).
 
dd if=/dev/zero of=/mnt/set1/testfile700 bs=4k count=1000000 conv=fsync
dd if=/dev/zero of=/mnt/set1/testfile700 bs=1M count=4000 conv=fsync

>See the various threads in here and the "classic" link:
>https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
 
#Lewis: I have been reading over a lot of his articles. They are really good. I did not see that one. Thank you for pointing it out.

>> I guess my question is, is there any additional optimizations or tuning I
>> should review here. I have read over all the docs, but I don't know which,
>> if any, of the values would need tweaking. Also, I am not sure if this is
>> just how it is with Ceph, given the need to write multiple copies of each
>> object. Is the slower write performance(averaging ~1/2 of the network
>> throughput) to be expected? I haven't seen any clear answer on that in the
>> docs or in articles I have found around. So, I am not sure if my
>> expectation is just wrong.
>>
>While the replication incurs some performance penalties, this is mostly an
>issue with small I/Os, not the type of large sequential writes you're
>doing.
>I'd expect a setup like yours to deliver more or less full line speed, if
>your network and SSDs are working correctly.
>
>In my crappy test cluster with an identical network setup to yours, 4
>nodes with 4 crappy SATA disks each (so 16 OSDs), I can get better and
>more consistent write speed than you, around 100MB/s.
>
>Christian
>
>> Anyway, some basic idea on those concepts or some pointers to some good
>> docs or articles would be wonderful. Thank you!
>>
>> Lewis George
>>
>>
>>
>
>
>--
>Christian Balzer Network/Systems Engineer
>chibi@xxxxxxx Global OnLine Japan/Rakuten Communications
>http://www.gol.com/
 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux