Production System Evaluation / Problems

"Strankowski, Florian" <FStrankowski@xxxxxxxxxxxxxxxxxxxxxxxxx> · Mon, 28 Nov 2016 09:29:17 +0000

Hey guys, 

we’re evaluating ceph at the moment for a bigger production-ready implementation. So far we’ve had some success and
some problems with ceph. In combination with Proxmox CEPH works quite well, if taken out of the box. I’ve tried to coverup my questions
with existing answers and solutions but i still find some things unclear. Here are the things i’m having problems with:

1.      
The first question is just for my understanding: How does CEPH account failure domains? For what i’ve read by now is

that i create a new CRUSH-Map with for example 2 datacenters, each DC has a rack and in this rack there is a chassis with nodes.

By using an own CRUSH-Map CEPH will „see“ it and deal with the data automatically. What i am missing here is some more possible adjustment.

For example i want to define that by using a replica of 3 i want CEPH to store the data 2 times in datacenter A and one time in datacenter B. Further

more i want read-access exclusivly within 1 datacenter (if possible and data is available) to keep rtt low. Is this possible?
2.      
I’ve build my own CRUSH-Map and tried to get it working. No success at all. I’m literally „done with this s…“
J thats why im here right now. Here is the state
of the cluster:

    cluster 42f04e55-0a3f-4644-8543-516cd46cd4e9
     health HEALTH_WARN
            79 pgs degraded
            262 pgs stale
            79 pgs stuck degraded
            262 pgs stuck stale
            512 pgs stuck unclean
            79 pgs stuck undersized
            79 pgs undersized
     monmap e8: 6 mons at {0=192.168.40.20:6789/0,1=192.168.40.21:6789/0,2=192.168.40.22:6789/0,3=192.168.40.23:6789/0,4=192.168.40.24:6789/0,5=192.168.40.25:6789/0}
            election epoch 86, quorum 0,1,2,3,4,5 0,1,2,3,4,5
     mdsmap e2: 0/0/1 up
     osdmap e212: 6 osds: 5 up, 5 in; 250 remapped pgs
      pgmap v366013: 512 pgs, 2 pools, 0 bytes data, 0 objects
            278 MB used, 900 GB / 901 GB avail
                 250 active+remapped
                 183 stale+active+remapped
                  79 stale+active+undersized+degraded+remapped

Here the config:

ID  WEIGHT  TYPE NAME                  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-27 1.07997 root default
-25 0.53998     datacenter datacenter1
-23 0.53998         chassis chassis1
-1 0.17999             blade blade3
  0 0.17999                 osd.0         down        0          1.00000
-2 0.17999             blade blade4
  1 0.17999                 osd.1           up  1.00000          1.00000
-3 0.17999             blade blade5
  2 0.17999                 osd.2           up  1.00000          1.00000
-26 0.53999     datacenter datacenter2
-24 0.53999         chassis chassis2
-17 0.17999             blade blade17
  3 0.17999                 osd.3           up  0.95001          1.00000
-18 0.17999             blade blade18
  4 0.17999                 osd.4           up  1.00000          1.00000
-19 0.17999             blade blade19
  5 0.17999                 osd.5           up  1.00000          1.00000

I simply cant get osd.0 back up. I took it offline, out, reinserterd, resetup, deleted the osd configs, remade them, no success
whatsoever. IMHO the documentation on this part is a bit „lousy“ so im missing some points of information here, sorry folks.

3.      
Last but not least i would like to know whether it is a good idea to have the data and config network instead on 2 dedicated nics on 2 dedicated vlans. Our
Hardware is redundant and we got 10GIG Fibreoptics inhouse and 80 GIG between the two datacenters. The data-vlan is using jumbo frames while the others dont.

4.      
Do you guys have somekind of „best practice“-book available for large scale deployments? 20+ Servers up to 100+ and 1000+

Regards

Florian

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com