Poor ceph cluster performance

Cody <codeology.lab@xxxxxxxxx> · Mon, 26 Nov 2018 19:06:34 -0500

Hello,

I have a Ceph cluster deployed together with OpenStack using TripleO.
While the Ceph cluster shows a healthy status, its performance is
painfully slow. After eliminating a possibility of network issues, I
have zeroed in on the Ceph cluster itself, but have no experience in
further debugging and tunning.

The Ceph OSD part of the cluster uses 3 identical servers with the
following specifications:

CPU: 2 x E5-2603 @1.8GHz
RAM: 16GB
Network: 1G port shared for Ceph public and cluster traffics
Journaling device: 1 x 120GB SSD (SATA3, consumer grade)
OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade)

This is not beefy enough in any way, but I am running for PoC only,
with minimum utilization.

Ceph-mon and ceph-mgr daemons are hosted on the OpenStack Controller
nodes. Ceph-ansible version is 3.1 and is using Filestore with
non-colocated scenario (1 SSD for every 2 OSDs). Connection speed
among Controllers, Computes, and OSD nodes can reach ~900Mbps tested
using iperf.

I followed the Red Hat Ceph 3 benchmarking procedure [1] and received
following results:

Write Test:

Total time run:         80.313004
Total writes made:      17
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     0.846687
Stddev Bandwidth:       0.320051
Max bandwidth (MB/sec): 2
Min bandwidth (MB/sec): 0
Average IOPS:           0
Stddev IOPS:            0
Max IOPS:               0
Min IOPS:               0
Average Latency(s):     66.6582
Stddev Latency(s):      15.5529
Max latency(s):         80.3122
Min latency(s):         29.7059

Sequencial Read Test:

Total time run:       25.951049
Total reads made:     17
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   2.62032
Average IOPS:         0
Stddev IOPS:          0
Max IOPS:             1
Min IOPS:             0
Average Latency(s):   24.4129
Max latency(s):       25.9492
Min latency(s):       0.117732

Random Read Test:

Total time run:       66.355433
Total reads made:     46
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   2.77295
Average IOPS:         0
Stddev IOPS:          3
Max IOPS:             27
Min IOPS:             0
Average Latency(s):   21.4531
Max latency(s):       66.1885
Min latency(s):       0.0395266

Apparently, the results are pathetic...

As I moved on to test block devices, I got a following error message:

# rbd map image01 --pool testbench --name client.admin
rbd: failed to add secret 'client.admin' to kernel

Any suggestions on the above error and/or debugging would be greatly
appreciated!

Thank you very much to all.

Cody

[1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/administration_guide/#benchmarking_performance
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com