Hi thanks for reply. > From the top of my head, it is recommended to use 3 mons in > production. Also, for the 22 osds your number of PGs look a bug low, > you should look at that. I get it from http://ceph.com/docs/master/rados/operations/placement-groups/ (22osd's * 100)/3 replicas = 733,3333 ~1024 pgs Please correct me if I'm wrong. It will be 5 mons (on 6 hosts) but now we must migrate some data from used servers. > > The performance of the cluster is poor - this is too vague. What is > your current performance, what benchmarks have you tried, what is your > data workload and most importantly, how is your cluster setup. what > disks, ssds, network, ram, etc. > > Please provide more information so that people could help you. > > Andrei Hardware informations: ceph15: RAM: 4GB Network: 4x 1GB NIC OSD disk's: 2x SATA Seagate ST31000524NS 2x SATA WDC WD1003FBYX-18Y7B0 ceph25: RAM: 16GB Network: 4x 1GB NIC OSD disk's: 2x SATA WDC WD7500BPKX-7 2x SATA WDC WD7500BPKX-2 2x SATA SSHD ST1000LM014-1EJ164 ceph30 RAM: 16GB Network: 4x 1GB NIC OSD disks: 6x SATA SSHD ST1000LM014-1EJ164 ceph35: RAM: 16GB Network: 4x 1GB NIC OSD disks: 6x SATA SSHD ST1000LM014-1EJ164 All journals are on OSD's. 2 NIC are for backend network (10.20.4.0/22) and 2 NIC are for frontend (10.20.8.0/22). This cluster we use as storage backend for <100VM's on KVM. I don't make benchmarks but all vm's are migrated from Xen+GlusterFS(NFS), before migration every VM are running fine, now each VM from time to time hangs for few seconds, apps installed on VM's loading much more time. GlusterFS are running on 2 servers with 1x 1GB NIC and 2x8 disks WDC WD7500BPKX-7. I make one test with recovery, if disk marks out, then recovery io is 150-200MB/s but all vm's hangs until recovery ends. Biggest load is on ceph35, IOps on each disk are near 150, cpu load ~4-5. On other hosts cpu load <2, 120~130iops Our ceph.conf =========== [global] fsid=a9d17295-62f2-46f6-8325-1cad7724e97f mon initial members = ceph35, ceph30, ceph25, ceph15 mon host = 10.20.8.35, 10.20.8.30, 10.20.8.25, 10.20.8.15 public network = 10.20.8.0/22 cluster network = 10.20.4.0/22 osd journal size = 1024 filestore xattr use omap = true osd pool default size = 3 osd pool default min size = 1 osd pool default pg num = 1024 osd pool default pgp num = 1024 osd crush chooseleaf type = 1 auth cluster required = cephx auth service required = cephx auth client required = cephx rbd default format = 2 ##ceph35 osds [osd.0] cluster addr = 10.20.4.35 [osd.1] cluster addr = 10.20.4.35 [osd.2] cluster addr = 10.20.4.35 [osd.3] cluster addr = 10.20.4.36 [osd.4] cluster addr = 10.20.4.36 [osd.5] cluster addr = 10.20.4.36 ##ceph25 osds [osd.6] cluster addr = 10.20.4.25 public addr = 10.20.8.25 [osd.7] cluster addr = 10.20.4.25 public addr = 10.20.8.25 [osd.8] cluster addr = 10.20.4.25 public addr = 10.20.8.25 [osd.9] cluster addr = 10.20.4.26 public addr = 10.20.8.26 [osd.10] cluster addr = 10.20.4.26 public addr = 10.20.8.26 [osd.11] cluster addr = 10.20.4.26 public addr = 10.20.8.26 ##ceph15 osds [osd.12] cluster addr = 10.20.4.15 public addr = 10.20.8.15 [osd.13] cluster addr = 10.20.4.15 public addr = 10.20.8.15 [osd.14] cluster addr = 10.20.4.15 public addr = 10.20.8.15 [osd.15] cluster addr = 10.20.4.16 public addr = 10.20.8.16 ##ceph30 osds [osd.16] cluster addr = 10.20.4.30 public addr = 10.20.8.30 [osd.17] cluster addr = 10.20.4.30 public addr = 10.20.8.30 [osd.18] cluster addr = 10.20.4.30 public addr = 10.20.8.30 [osd.19] cluster addr = 10.20.4.31 public addr = 10.20.8.31 [osd.20] cluster addr = 10.20.4.31 public addr = 10.20.8.31 [osd.21] cluster addr = 10.20.4.31 public addr = 10.20.8.31 [mon.ceph35] host = ceph35 mon addr = 10.20.8.35:6789 [mon.ceph30] host = ceph30 mon addr = 10.20.8.30:6789 [mon.ceph25] host = ceph25 mon addr = 10.20.8.25:6789 [mon.ceph15] host = ceph15 mon addr = 10.20.8.15:6789 ================ Regards, Mateusz