Re: Understanding write performance

Christian Balzer <chibi@xxxxxxx> · Fri, 19 Aug 2016 10:31:35 +0900

Hello,

On Thu, 18 Aug 2016 12:03:36 -0700 lewis.george@xxxxxxxxxxxxx wrote:

> Hi,
>  So, I have really been trying to find information about this without 
> annoying the list, but I just can't seem to get any clear picture of it. I 
> was going to try to search the mailing list archive, but it seems there is 
> an error when trying to search it right now(posting below, and sending to 
> listed address in error). 
>
Google (as in all the various archives of this ML) works well for me,
as always the results depend on picking "good" search strings.

>  I have been working for a couple of months now(slowly) on testing out 
> Ceph. I only have a small PoC setup. I have 6 hosts, but I am only using 3 
> of them in the cluster at the moment. They each have 6xSSDs(only 5 usable 
> by Ceph), but the networks(1 public, 1 cluster) are only 1Gbps. I have the 
> MONs running on the same 3 hosts, and I have an OSD process running for 
> each of the 5 disks per host. The cluster shows in good health, with 15 
> OSDs. I have one pool there, the default rbd, which I setup with 512 PGs. 
>   
Exact SSD models, please.
Also CPU, though at 1GbE that isn't going to be your problem. 

>  I have create an rbd image on the pool, and I have it mapped and mounted 
> on another client host. 
Mapped via the kernel interface?

>When doing write tests, like with 'dd', I am 
> getting rather spotty performance. 
Example dd command line please.

>Not only is it up and down, but even 
> when it is up, the performance isn't that great. On large'ish(4GB 
> sequential) writes, it averages about 65MB/s, and on repeated smaller(40MB) 
> sequential writes, it is jumping around between 20MB/s and 80MB/s. 
>
Monitor your storage nodes during these test runs with atop (or iostat)
and see how busy your actual SSDs are then.
Also test with "rados bench" to get a base line.

>  However, with read tests, I am able to completely max out the network 
> there, easily reaching 125MB/s. Tests on the disks directly are able to get 
> up to 550MB/s reads and 350MB/s writes. So, I know it isn't a problem with 
> the disks.
>
How did you test these speed, exact command line please.
There are SSDs that can write very fast with buffered I/O but are
abysmally slow with sync/direct I/O. 
Which is what Ceph journals use.

See the various threads in here and the "classic" link:
https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

>  I guess my question is, is there any additional optimizations or tuning I 
> should review here. I have read over all the docs, but I don't know which, 
> if any, of the values would need tweaking. Also, I am not sure if this is 
> just how it is with Ceph, given the need to write multiple copies of each 
> object. Is the slower write performance(averaging ~1/2 of the network 
> throughput) to be expected? I haven't seen any clear answer on that in the 
> docs or in articles I have found around. So, I am not sure if my 
> expectation is just wrong. 
>   
While the replication incurs some performance penalties, this is mostly an
issue with small I/Os, not the type of large sequential writes you're
doing.
I'd expect a setup like yours to deliver more or less full line speed, if
your network and SSDs are working correctly. 

In my crappy test cluster with an identical network setup to yours, 4
nodes with 4 crappy SATA disks each (so 16 OSDs), I can get better and
more consistent write speed than you, around 100MB/s.

Christian

>  Anyway, some basic idea on those concepts or some pointers to some good 
> docs or articles would be wonderful. Thank you!
>   
>  Lewis George
>   
>   
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com