Re: Ceph Random Read Write Performance

Christian Balzer <chibi@xxxxxxx> · Sun, 20 Aug 2017 21:04:15 +0900

Hello,

On Sun, 20 Aug 2017 18:07:09 +0700 Sam Huracan wrote:

> Hi,
> 
> I have a question about Ceph's performance
You really, really want to do yourself a favor and research things (aka
googling the archives of this ML).
Not a week or a month goes by with somebody asking this question.

> I've built a Ceph  cluster with 3 OSD host, each host's configuration:
>  - CPU: 1 x Intel Xeon E5-2620 v4 2.1GHz
You will want higher clock speeds and less cores in general with SSDs and
NVMEs. 
Analyze your systems with atop, etc, during these benchmarks and see if
you're CPU bound, which you might very well be.

>  - Memory: 2 x 16GB RDIMM
While sufficient, not much space left for pagecache and SLAB.

>  - Disk: 2 x 300GB 15K RPM SAS 12Gbps (RAID 1 for OS)
Unless these are behind a HW cache RAID controller and also assuming
your MONs are also on these 3 hosts, SSDs for the leveldb activities
would be better.

>             4 x 800GB Solid State Drive SATA (non-RAID for OSD)(Intel SSD
> DC S3610)
Good luck getting any more of these.

>  - NIC: 1 x 10Gbps (bonding for both public and replicate network).
> 
1x doesn't sound like bonding, one assumes 4x 10Gbps?
Also splitting the network makes only _real_ performance sense in far less
scenarios than most people think.

> My ceph.conf: https://pastebin.com/r4pJ3P45
> We use this cluster for OpenStack cinder's backend.
> 
> We have benchmark this cluster by using 6 VMs, using vdbench.
> Our vdbench script:  https://pastebin.com/9sxhrjie
> 
> After test, we got the result:
>  - 100% random read: 100.000 IOPS
>  - 100% random write: 20.000 IOPS
>  - 75% RR - 25% RW: 80.000 IOPS
> 
> That results are so low, because we calculate the performance of this
> cluster is: 112.000 IOPS Write and 1000.000 IOPS Read,
> 

You're making the same mistake as all the people before you (see above,
google) and expecting the local performance of your SSDs when you're in
fact dealing with the much more complex and involved Ceph stack.

Firstly, for writes, these happen twice, once to the journal and once to
the actual OSD storage space. 
So just 56K write IOPS, _if_ they were local. 
Which they AREN'T. 

And now meet Mr. Latency, both in the network and the Ceph software stack.
This is were the rest of your performance gets lost.
The network part is unavoidable (a local SAS/SATA link is not the same as
a bonded 10Gbps link), though 25Gbps, IB etc can help.
The Ceph stack will benefit from faster CPUs as mentioned above.

> We are using Ceph Jewel 10.2.5-1trusty, kernel 4.4.0.-31 generic, Ubuntu
> 14.04
> 
Not exactly the latest crop of kernels, as a general comment.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com