On Tue, Jan 16, 2018 at 2:17 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > On Tue, Jan 16, 2018 at 6:07 AM Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> > wrote: >> >> I found a few WAN RBD cluster design discussions, but not a local one, >> so was wonderinng if anyone has experience with a resilience-oriented >> short distance (<10 km, redundant fiber connections) cluster in two >> datacenters with a third site for quorum purposes only? >> >> I can see two types of scenarios: >> >> 1. Two (or even number) of OSD nodes at each site, 4x replication >> (size 4, min_size 2). Three MONs, one at each site to handle split >> brain. >> >> Question: How does the cluster handle the loss of communication >> between the OSD sites A and B, while both can communicate with the >> quorum site C? It seems, one of the sites should suspend, as OSDs >> will not be able to communicate between sites. > > > Sadly this won't work — the OSDs on each side will report their peers on the > other side down, but both will be able to connect to a live monitor. > (Assuming the quorum site holds the leader monitor, anyway — if one of the > main sites holds what should be the leader, you'll get into a monitor > election storm instead.) You'll need your own netsplit monitoring to shut > down one site if that kind of connection cut is a possibility. What about running a split brain aware too, such as Pacemaker, and running a copy of the same VM as a mon at each site? In case of a split brain network separation, Pacemaker would (aware via third site) stop the mon on site A and bring up the mon on site B (or whatever the rules are set to). I read earlier that a mon with the same IP, name and keyring would just look to the ceph cluster as a very old mon, but still able to vote for quorum. Vincent Godin also described an HSRP based method, which would accomplish this goal via network routing. That seems like a good approach too, I just need to check on HSRP availability. > >> >> >> 2. 3x replication for performance or cost (size 3, min_size 2 - or >> even min_size 1 and strict monitoring). Two replicas and two MONs at >> one site and one replica and one MON at the other site. >> >> Question: in case of a permanent failure of the main site (with two >> replicas), how to manually force the other site (with one replica and >> MON) to provide storage? I would think a CRUSH map change and >> modifying ceph.conf to include just one MON, then build two more MONs >> locally and add? > > > Yep, pretty much that. You won't need to change ceph.conf to just one mon so > much as to include the current set of mons and update the monmap. I believe > that process is in the disaster recovery section of the docs. Thank you. Alex > -Greg > >> >> >> -- >> Alex Gorbachev >> Storcium >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com