Ceph monitor load, low performance

mateusz.skala@xxxxxxxxxxx (Mateusz Skała) · Tue, 26 Aug 2014 10:17:39 +0200

Hi thanks for reply.

> From the top of my head, it is recommended to use 3 mons in
> production. Also, for the 22 osds your number of PGs look a bug low,
> you should look at that.
I get it from 
http://ceph.com/docs/master/rados/operations/placement-groups/

(22osd's * 100)/3 replicas = 733,3333 ~1024 pgs
Please correct me if I'm wrong.

It will be 5 mons (on 6 hosts) but now we must migrate some data from 
used servers.

> 
> The performance of the cluster is poor - this is too vague. What is
> your current performance, what benchmarks have you tried, what is your
> data workload and most importantly, how is your cluster setup. what
> disks, ssds, network, ram, etc.
> 
> Please provide more information so that people could help you.
> 
> Andrei

Hardware informations:
ceph15:
RAM: 4GB
Network: 4x 1GB NIC
OSD disk's:
2x SATA Seagate ST31000524NS
2x SATA WDC WD1003FBYX-18Y7B0

ceph25:
RAM: 16GB
Network: 4x 1GB NIC
OSD disk's:
2x SATA WDC WD7500BPKX-7
2x SATA WDC WD7500BPKX-2
2x SATA SSHD ST1000LM014-1EJ164

ceph30
RAM: 16GB
Network: 4x 1GB NIC
OSD disks:
6x SATA SSHD ST1000LM014-1EJ164

ceph35:
RAM: 16GB
Network: 4x 1GB NIC
OSD disks:
6x SATA SSHD ST1000LM014-1EJ164

All journals are on OSD's. 2 NIC are for backend network (10.20.4.0/22) 
and 2 NIC are for frontend (10.20.8.0/22).

This cluster we use as storage backend for <100VM's on KVM. I don't make 
benchmarks but all vm's are migrated from Xen+GlusterFS(NFS), before 
migration every VM are running fine, now each VM  from time to time 
hangs for few seconds, apps installed on VM's loading much more time. 
GlusterFS are running on 2 servers with 1x 1GB NIC and 2x8 disks WDC 
WD7500BPKX-7.

I make one test with recovery, if disk marks out, then recovery io is 
150-200MB/s but all vm's hangs until recovery ends.

Biggest load is on ceph35, IOps on each disk are near 150, cpu load 
~4-5.
On other hosts cpu load <2, 120~130iops

Our ceph.conf

===========
[global]

fsid=a9d17295-62f2-46f6-8325-1cad7724e97f
mon initial members = ceph35, ceph30, ceph25, ceph15
mon host = 10.20.8.35, 10.20.8.30, 10.20.8.25, 10.20.8.15
public network = 10.20.8.0/22
cluster network = 10.20.4.0/22
osd journal size = 1024
filestore xattr use omap = true
osd pool default size = 3
osd pool default min size = 1
osd pool default pg num = 1024
osd pool default pgp num = 1024
osd crush chooseleaf type = 1
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
rbd default format = 2

##ceph35 osds
[osd.0]
cluster addr = 10.20.4.35
[osd.1]
cluster addr = 10.20.4.35
[osd.2]
cluster addr = 10.20.4.35
[osd.3]
cluster addr = 10.20.4.36
[osd.4]
cluster addr = 10.20.4.36
[osd.5]
cluster addr = 10.20.4.36

##ceph25 osds
[osd.6]
cluster addr = 10.20.4.25
public addr = 10.20.8.25
[osd.7]
cluster addr = 10.20.4.25
public addr = 10.20.8.25
[osd.8]
cluster addr = 10.20.4.25
public addr = 10.20.8.25
[osd.9]
cluster addr = 10.20.4.26
public addr = 10.20.8.26
[osd.10]
cluster addr = 10.20.4.26
public addr = 10.20.8.26
[osd.11]
cluster addr = 10.20.4.26
public addr = 10.20.8.26

##ceph15 osds
[osd.12]
cluster addr = 10.20.4.15
public addr = 10.20.8.15
[osd.13]
cluster addr = 10.20.4.15
public addr = 10.20.8.15
[osd.14]
cluster addr = 10.20.4.15
public addr = 10.20.8.15
[osd.15]
cluster addr = 10.20.4.16
public addr = 10.20.8.16

##ceph30 osds
[osd.16]
cluster addr = 10.20.4.30
public addr = 10.20.8.30
[osd.17]
cluster addr = 10.20.4.30
public addr = 10.20.8.30
[osd.18]
cluster addr = 10.20.4.30
public addr = 10.20.8.30
[osd.19]
cluster addr = 10.20.4.31
public addr = 10.20.8.31
[osd.20]
cluster addr = 10.20.4.31
public addr = 10.20.8.31
[osd.21]
cluster addr = 10.20.4.31
public addr = 10.20.8.31

[mon.ceph35]
host = ceph35
mon addr = 10.20.8.35:6789
[mon.ceph30]
host = ceph30
mon addr = 10.20.8.30:6789
[mon.ceph25]
host = ceph25
mon addr = 10.20.8.25:6789
[mon.ceph15]
host = ceph15
mon addr = 10.20.8.15:6789
================

Regards,
Mateusz