Re: Production System Evaluation / Problems

Maxime Guyot <Maxime.Guyot@xxxxxxxxx> · Mon, 28 Nov 2016 13:27:50 +0000

Hi,

1.      
It is possible to do that with the primary affinity setting. The documentation gives an example with SSD as primary OSD and HDD as secondary. I think it would work for Active/Passive DC scenario might be tricky for Active/Active. If
 you do Ceph across 2 DCs you might have problems with quorum, a third location with 1 MON can help break ties.
2.      
Zap & re-create?
3.      
It is common to use 2 VLANs on a LACP bond instead of 1 NIC on each VLAN. You just need to size the pipes accordingly to avoid bottlenecks.

Cheers,

Maxime Guyot

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Stefan Lissmats <stefan@xxxxxxxxxx>

Date: Monday 28 November 2016 11:12

To: "Strankowski, Florian" <FStrankowski@xxxxxxxxxxxxxxxxxxxxxxxxx>, "'ceph-users@xxxxxxxxxxxxxx'" <ceph-users@xxxxxxxxxxxxxx>

Subject: Re: [ceph-users] Production System Evaluation / Problems

Hey!

I have using ceph for a while bu is not a real expert but i will give you some pointers to make everyone able to help you further.

1. The crush map is kind of devided into two parts, the topology description, (which you provided us with) and also the crush rules that defines how the data is placed
 in the topology. Have you made any changes in the rules? If you have made any changes it would be great if you provided how the rules is defined. However i think you can get the data placed the way you want with some more advanced crush rules, but I don't
 think there is any possibility to have a read only copy. Guess you have seen this? http://docs.ceph.com/docs/jewel/rados/operations/crush-map/

2. 
Have you looked into the osd logs server that osd.0 resides on? That could give some information why osd.0 never comes up. It should normally be in /var/log/ceph/ceph-
osd.0.log 

Other notes: 

You have 6 mons but you normally want an odd number and do not normally need more than 5 (or even 3 is).

Från: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx]
 för Strankowski, Florian [FStrankowski@xxxxxxxxxxxxxxxxxxxxxxxxx]

Skickat: den 28 november 2016 10:29

Till: 'ceph-users@xxxxxxxxxxxxxx'

Ämne: [ceph-users] Production System Evaluation / Problems

Hey guys, 

we’re evaluating ceph at the moment for a bigger production-ready implementation. So far we’ve had some success and
some problems with ceph. In combination with Proxmox CEPH works quite well, if taken out of the box. I’ve tried to coverup my questions
with existing answers and solutions but i still find some things unclear. Here are the things i’m having problems with:

1.      
The first question is just for my understanding: How does CEPH account failure domains? For what i’ve read by now is

that i create a new CRUSH-Map with for example 2 datacenters, each DC has a rack and in this rack there is a chassis with nodes.

By using an own CRUSH-Map CEPH will „see“ it and deal with the data automatically. What i am missing here is some more possible adjustment.

For example i want to define that by using a replica of 3 i want CEPH to store the data 2 times in datacenter A and one time in datacenter B. Further

more i want read-access exclusivly within 1 datacenter (if possible and data is available) to keep rtt low. Is this possible?
2.      
I’ve build my own CRUSH-Map and tried to get it working. No success at all. I’m literally „done with this s…“
J thats why im here right now. Here is the state
of the cluster:

    cluster 42f04e55-0a3f-4644-8543-516cd46cd4e9
     health HEALTH_WARN
            79 pgs degraded
            262 pgs stale
            79 pgs stuck degraded
            262 pgs stuck stale
            512 pgs stuck unclean
            79 pgs stuck undersized
            79 pgs undersized
     monmap e8: 6 mons at {0=192.168.40.20:6789/0,1=192.168.40.21:6789/0,2=192.168.40.22:6789/0,3=192.168.40.23:6789/0,4=192.168.40.24:6789/0,5=192.168.40.25:6789/0}
            election epoch 86, quorum 0,1,2,3,4,5 0,1,2,3,4,5
     mdsmap e2: 0/0/1 up
     osdmap e212: 6 osds: 5 up, 5 in; 250 remapped pgs
      pgmap v366013: 512 pgs, 2 pools, 0 bytes data, 0 objects
            278 MB used, 900 GB / 901 GB avail
                 250 active+remapped
                 183 stale+active+remapped
                  79 stale+active+undersized+degraded+remapped

Here the config:

ID  WEIGHT  TYPE NAME                  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-27 1.07997 root default
-25 0.53998     datacenter datacenter1
-23 0.53998         chassis chassis1
-1 0.17999             blade blade3
  0 0.17999                 osd.0         down        0          1.00000
-2 0.17999             blade blade4
  1 0.17999                 osd.1           up  1.00000          1.00000
-3 0.17999             blade blade5
  2 0.17999                 osd.2           up  1.00000          1.00000
-26 0.53999     datacenter datacenter2
-24 0.53999         chassis chassis2
-17 0.17999             blade blade17
  3 0.17999                 osd.3           up  0.95001          1.00000
-18 0.17999             blade blade18
  4 0.17999                 osd.4           up  1.00000          1.00000
-19 0.17999             blade blade19
  5 0.17999                 osd.5           up  1.00000          1.00000

I simply cant get osd.0 back up. I took it offline, out, reinserterd, resetup, deleted the osd configs, remade them, no success
whatsoever. IMHO the documentation on this part is a bit „lousy“ so im missing some points of information here, sorry folks.

3.      
Last but not least i would like to know whether it is a good idea to have the data and config network instead on 2 dedicated nics on 2 dedicated vlans. Our
Hardware is redundant and we got 10GIG Fibreoptics inhouse and 80 GIG between the two datacenters. The data-vlan is using jumbo frames while the others dont.

4.      
Do you guys have somekind of „best practice“-book available for large scale deployments? 20+ Servers up to 100+ and 1000+

Regards

Florian

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com