Ceph monitor load, low performance

malmyzh@xxxxxxxxx (Irek Fasikhov) · Tue, 26 Aug 2014 13:02:03 +0400

Move logs on the SSD and immediately increase performance. you have about
50% of the performance lost on logs. And just for the three replications
recommended more than 5 hosts

2014-08-26 12:17 GMT+04:00 Mateusz Ska?a <mateusz.skala at budikom.net>:

>
> Hi thanks for reply.
>
>
>
>  From the top of my head, it is recommended to use 3 mons in
>> production. Also, for the 22 osds your number of PGs look a bug low,
>> you should look at that.
>>
> I get it from http://ceph.com/docs/master/rados/operations/placement-
> groups/
>
> (22osd's * 100)/3 replicas = 733,3333 ~1024 pgs
> Please correct me if I'm wrong.
>
> It will be 5 mons (on 6 hosts) but now we must migrate some data from used
> servers.
>
>
>
>
>> The performance of the cluster is poor - this is too vague. What is
>> your current performance, what benchmarks have you tried, what is your
>> data workload and most importantly, how is your cluster setup. what
>> disks, ssds, network, ram, etc.
>>
>> Please provide more information so that people could help you.
>>
>> Andrei
>>
>
> Hardware informations:
> ceph15:
> RAM: 4GB
> Network: 4x 1GB NIC
> OSD disk's:
> 2x SATA Seagate ST31000524NS
> 2x SATA WDC WD1003FBYX-18Y7B0
>
> ceph25:
> RAM: 16GB
> Network: 4x 1GB NIC
> OSD disk's:
> 2x SATA WDC WD7500BPKX-7
> 2x SATA WDC WD7500BPKX-2
> 2x SATA SSHD ST1000LM014-1EJ164
>
> ceph30
> RAM: 16GB
> Network: 4x 1GB NIC
> OSD disks:
> 6x SATA SSHD ST1000LM014-1EJ164
>
> ceph35:
> RAM: 16GB
> Network: 4x 1GB NIC
> OSD disks:
> 6x SATA SSHD ST1000LM014-1EJ164
>
>
> All journals are on OSD's. 2 NIC are for backend network (10.20.4.0/22)
> and 2 NIC are for frontend (10.20.8.0/22).
>
> This cluster we use as storage backend for <100VM's on KVM. I don't make
> benchmarks but all vm's are migrated from Xen+GlusterFS(NFS), before
> migration every VM are running fine, now each VM  from time to time hangs
> for few seconds, apps installed on VM's loading much more time. GlusterFS
> are running on 2 servers with 1x 1GB NIC and 2x8 disks WDC WD7500BPKX-7.
>
> I make one test with recovery, if disk marks out, then recovery io is
> 150-200MB/s but all vm's hangs until recovery ends.
>
> Biggest load is on ceph35, IOps on each disk are near 150, cpu load ~4-5.
> On other hosts cpu load <2, 120~130iops
>
> Our ceph.conf
>
> ===========
> [global]
>
> fsid=a9d17295-62f2-46f6-8325-1cad7724e97f
> mon initial members = ceph35, ceph30, ceph25, ceph15
> mon host = 10.20.8.35, 10.20.8.30, 10.20.8.25, 10.20.8.15
> public network = 10.20.8.0/22
> cluster network = 10.20.4.0/22
> osd journal size = 1024
> filestore xattr use omap = true
> osd pool default size = 3
> osd pool default min size = 1
> osd pool default pg num = 1024
> osd pool default pgp num = 1024
> osd crush chooseleaf type = 1
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> rbd default format = 2
>
> ##ceph35 osds
> [osd.0]
> cluster addr = 10.20.4.35
> [osd.1]
> cluster addr = 10.20.4.35
> [osd.2]
> cluster addr = 10.20.4.35
> [osd.3]
> cluster addr = 10.20.4.36
> [osd.4]
> cluster addr = 10.20.4.36
> [osd.5]
> cluster addr = 10.20.4.36
>
> ##ceph25 osds
> [osd.6]
> cluster addr = 10.20.4.25
> public addr = 10.20.8.25
> [osd.7]
> cluster addr = 10.20.4.25
> public addr = 10.20.8.25
> [osd.8]
> cluster addr = 10.20.4.25
> public addr = 10.20.8.25
> [osd.9]
> cluster addr = 10.20.4.26
> public addr = 10.20.8.26
> [osd.10]
> cluster addr = 10.20.4.26
> public addr = 10.20.8.26
> [osd.11]
> cluster addr = 10.20.4.26
> public addr = 10.20.8.26
>
> ##ceph15 osds
> [osd.12]
> cluster addr = 10.20.4.15
> public addr = 10.20.8.15
> [osd.13]
> cluster addr = 10.20.4.15
> public addr = 10.20.8.15
> [osd.14]
> cluster addr = 10.20.4.15
> public addr = 10.20.8.15
> [osd.15]
> cluster addr = 10.20.4.16
> public addr = 10.20.8.16
>
> ##ceph30 osds
> [osd.16]
> cluster addr = 10.20.4.30
> public addr = 10.20.8.30
> [osd.17]
> cluster addr = 10.20.4.30
> public addr = 10.20.8.30
> [osd.18]
> cluster addr = 10.20.4.30
> public addr = 10.20.8.30
> [osd.19]
> cluster addr = 10.20.4.31
> public addr = 10.20.8.31
> [osd.20]
> cluster addr = 10.20.4.31
> public addr = 10.20.8.31
> [osd.21]
> cluster addr = 10.20.4.31
> public addr = 10.20.8.31
>
> [mon.ceph35]
> host = ceph35
> mon addr = 10.20.8.35:6789
> [mon.ceph30]
> host = ceph30
> mon addr = 10.20.8.30:6789
> [mon.ceph25]
> host = ceph25
> mon addr = 10.20.8.25:6789
> [mon.ceph15]
> host = ceph15
> mon addr = 10.20.8.15:6789
> ================
>
> Regards,
>
> Mateusz
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
? ?????????, ??????? ???? ???????????
???.: +79229045757
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140826/745f9755/attachment.htm>