Hi, 1.
It is possible to do that with the primary affinity setting. The documentation gives an example with SSD as primary OSD and HDD as secondary. I think it would work for Active/Passive DC scenario might be tricky for Active/Active. If
you do Ceph across 2 DCs you might have problems with quorum, a third location with 1 MON can help break ties. 2.
Zap & re-create? 3.
It is common to use 2 VLANs on a LACP bond instead of 1 NIC on each VLAN. You just need to size the pipes accordingly to avoid bottlenecks.
From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Stefan Lissmats <stefan@xxxxxxxxxx> Hey!
I have using ceph for a while bu is not a real expert but i will give you some pointers to make everyone able to help you further. 1. The crush map is kind of devided into two parts, the topology description, (which you provided us with) and also the crush rules that defines how the data is placed
in the topology. Have you made any changes in the rules? If you have made any changes it would be great if you provided how the rules is defined. However i think you can get the data placed the way you want with some more advanced crush rules, but I don't
think there is any possibility to have a read only copy. Guess you have seen this? http://docs.ceph.com/docs/jewel/rados/operations/crush-map/ 2.
Have you looked into the osd logs server that osd.0 resides on? That could give some information why osd.0 never comes up. It should normally be in /var/log/ceph/ceph- osd.0.log Other notes: You have 6 mons but you normally want an odd number and do not normally need more than 5 (or even 3 is). Från: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx]
för Strankowski, Florian [FStrankowski@xxxxxxxxxxxxxxxxxxxxxxxxx] Hey guys, we’re evaluating ceph at the moment for a bigger production-ready implementation. So far we’ve had some success and some problems with ceph. In combination with Proxmox CEPH works quite well, if taken out of the box. I’ve tried to coverup my questions with existing answers and solutions but i still find some things unclear. Here are the things i’m having problems with: 1.
The first question is just for my understanding: How does CEPH account failure domains? For what i’ve read by now is 2.
I’ve build my own CRUSH-Map and tried to get it working. No success at all. I’m literally „done with this s…“
J thats why im here right now. Here is the state of the cluster: cluster 42f04e55-0a3f-4644-8543-516cd46cd4e9 health HEALTH_WARN 79 pgs degraded 262 pgs stale 79 pgs stuck degraded 262 pgs stuck stale 512 pgs stuck unclean 79 pgs stuck undersized 79 pgs undersized monmap e8: 6 mons at {0=192.168.40.20:6789/0,1=192.168.40.21:6789/0,2=192.168.40.22:6789/0,3=192.168.40.23:6789/0,4=192.168.40.24:6789/0,5=192.168.40.25:6789/0} election epoch 86, quorum 0,1,2,3,4,5 0,1,2,3,4,5 mdsmap e2: 0/0/1 up osdmap e212: 6 osds: 5 up, 5 in; 250 remapped pgs pgmap v366013: 512 pgs, 2 pools, 0 bytes data, 0 objects 278 MB used, 900 GB / 901 GB avail 250 active+remapped 183 stale+active+remapped 79 stale+active+undersized+degraded+remapped Here the config: ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -27 1.07997 root default -25 0.53998 datacenter datacenter1 -23 0.53998 chassis chassis1 -1 0.17999 blade blade3 0 0.17999 osd.0 down 0 1.00000 -2 0.17999 blade blade4 1 0.17999 osd.1 up 1.00000 1.00000 -3 0.17999 blade blade5 2 0.17999 osd.2 up 1.00000 1.00000 -26 0.53999 datacenter datacenter2 -24 0.53999 chassis chassis2 -17 0.17999 blade blade17 3 0.17999 osd.3 up 0.95001 1.00000 -18 0.17999 blade blade18 4 0.17999 osd.4 up 1.00000 1.00000 -19 0.17999 blade blade19 5 0.17999 osd.5 up 1.00000 1.00000 I simply cant get osd.0 back up. I took it offline, out, reinserterd, resetup, deleted the osd configs, remade them, no success whatsoever. IMHO the documentation on this part is a bit „lousy“ so im missing some points of information here, sorry folks.
3.
Last but not least i would like to know whether it is a good idea to have the data and config network instead on 2 dedicated nics on 2 dedicated vlans. Our Hardware is redundant and we got 10GIG Fibreoptics inhouse and 80 GIG between the two datacenters. The data-vlan is using jumbo frames while the others dont. 4.
Do you guys have somekind of „best practice“-book available for large scale deployments? 20+ Servers up to 100+ and 1000+
Regards Florian |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com