Re: Poor ceph cluster performance

Darius Kasparavičius <daznis@xxxxxxxxx> · Tue, 27 Nov 2018 10:14:38 +0200



Hi,


Most likely the issue is with your consumer grade journal ssd. Run
this to your ssd to check if it performs: fio --filename=<SSD DEVICE>
--direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1
--runtime=60 --time_based --group_reporting --name=journal-test
On Tue, Nov 27, 2018 at 2:06 AM Cody <codeology.lab@xxxxxxxxx> wrote:
>
> Hello,
>
> I have a Ceph cluster deployed together with OpenStack using TripleO.
> While the Ceph cluster shows a healthy status, its performance is
> painfully slow. After eliminating a possibility of network issues, I
> have zeroed in on the Ceph cluster itself, but have no experience in
> further debugging and tunning.
>
> The Ceph OSD part of the cluster uses 3 identical servers with the
> following specifications:
>
> CPU: 2 x E5-2603 @1.8GHz
> RAM: 16GB
> Network: 1G port shared for Ceph public and cluster traffics
> Journaling device: 1 x 120GB SSD (SATA3, consumer grade)
> OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade)
>
> This is not beefy enough in any way, but I am running for PoC only,
> with minimum utilization.
>
> Ceph-mon and ceph-mgr daemons are hosted on the OpenStack Controller
> nodes. Ceph-ansible version is 3.1 and is using Filestore with
> non-colocated scenario (1 SSD for every 2 OSDs). Connection speed
> among Controllers, Computes, and OSD nodes can reach ~900Mbps tested
> using iperf.
>
> I followed the Red Hat Ceph 3 benchmarking procedure [1] and received
> following results:
>
> Write Test:
>
> Total time run:         80.313004
> Total writes made:      17
> Write size:             4194304
> Object size:            4194304
> Bandwidth (MB/sec):     0.846687
> Stddev Bandwidth:       0.320051
> Max bandwidth (MB/sec): 2
> Min bandwidth (MB/sec): 0
> Average IOPS:           0
> Stddev IOPS:            0
> Max IOPS:               0
> Min IOPS:               0
> Average Latency(s):     66.6582
> Stddev Latency(s):      15.5529
> Max latency(s):         80.3122
> Min latency(s):         29.7059
>
> Sequencial Read Test:
>
> Total time run:       25.951049
> Total reads made:     17
> Read size:            4194304
> Object size:          4194304
> Bandwidth (MB/sec):   2.62032
> Average IOPS:         0
> Stddev IOPS:          0
> Max IOPS:             1
> Min IOPS:             0
> Average Latency(s):   24.4129
> Max latency(s):       25.9492
> Min latency(s):       0.117732
>
> Random Read Test:
>
> Total time run:       66.355433
> Total reads made:     46
> Read size:            4194304
> Object size:          4194304
> Bandwidth (MB/sec):   2.77295
> Average IOPS:         0
> Stddev IOPS:          3
> Max IOPS:             27
> Min IOPS:             0
> Average Latency(s):   21.4531
> Max latency(s):       66.1885
> Min latency(s):       0.0395266
>
> Apparently, the results are pathetic...
>
> As I moved on to test block devices, I got a following error message:
>
> # rbd map image01 --pool testbench --name client.admin
> rbd: failed to add secret 'client.admin' to kernel
>
> Any suggestions on the above error and/or debugging would be greatly
> appreciated!
>
> Thank you very much to all.
>
> Cody
>
> [1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/administration_guide/#benchmarking_performance
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com