On 04/13/2015 02:25 AM, Christian Balzer wrote: > On Sun, 12 Apr 2015 14:37:56 -0700 Gregory Farnum wrote: > >> On Sun, Apr 12, 2015 at 1:58 PM, Francois Lafont <flafdivers@xxxxxxx> >> wrote: >>> Somnath Roy wrote: >>> >>>> Interesting scenario :-).. IMHO, I don't think cluster will be in >>>> healthy state here if the connections between dc1 and dc2 is cut. The >>>> reason is the following. >>>> >>>> 1. only osd.5 can talk to both data center OSDs and other 2 mons >>>> will not be. So, they can't reach to an agreement (and form quorum) >>>> about the state of OSDs. >>>> >>>> 2. OSDs on dc1 and dc2 will not be able to talk to each other, >>>> considering replicas across data centers, the cluster will be broken. >>> >>> Yes, in fact, after thought, I have the first question below. >>> >>> If: (more clear with a schema is the head ;)) >>> >>> 1. mon.1 and mon.2 can talk together (in dc1) and can talk with >>> mon.5 (via the VPN) but can't talk with mon.3 and mon.4 (in dc2) >>> 2. mon.3 and mon.4 can talk together (in dc2) and can talk with >>> mon.5 (via the VPN) but can't talk with mon.1 and mon.2 (in dc1) >>> 3. mon.5 can talk with mon.1, mon.2, mon.3, mon.4 and mon.5 >>> >>> is the quorum reached? If yes, which is the quorum? >> >> Yes, you should get a quorum as mon.5 will vote for one datacenter or >> the other. Which one it chooses will depend on which monitor has the >> "lowest" IP address (I think, or maybe just the monitor IDs or >> something? Anyway, it's a consistent ordering). > > Pet peeve alert. ^_- > > It's the lowest IP. To be more precise, it's the lowest IP:PORT combination: 10.0.1.2:6789 = rank 0 10.0.1.2:6790 = rank 1 10.0.1.3:6789 = rank 3 and so on. > Which is something that really needs to be documented (better) so that > people can plan things accordingly and have the leader monitor wind up on > the best suited hardware (in case not everything is being equal). > > Other than that, the sequence of how (initial?) mons are listed in > ceph.conf would of course be the most natural, expected way to sort > monitors. I don't agree. I find it hard to rely on ceph.conf for sensitive decisions like this, because we must ensure that ceph.conf is the same in all the nodes; and I've seen this not being the case more often than not. On the other hand, I do agree that we should make it easier for people to specify which monitors they want in the line of succession to the leader, so that they can plan their clusters accordingly. I do believe we can set this on the monmap, ideally once the first quorum is formed; something like: ceph mon rank set mon.a 0 ceph mon rank set mon.b 2 ceph mon rank set mon.c 1 ceph mon rank list MON IP:PORT RANK POLICY STATUS mon.a 10.0.1.2:6789 rank 0 [set-by-user] leader mon.c 10.0.1.3:6789 rank 1 [set-by-user] peon mon.b 10.0.1.2:6790 rank 2 [set-by-user] down mon.d 10.0.1.4:6789 rank 3 [default] peon Thoughts? -Joao > > Christian > > >> Under no circumstances >> whatsoever will mon.5 help each datacenter create their own quorums at >> the same time. The other data center will just be out of luck and >> unable to do anything. >> Although it's possible that the formed quorum won't be very stable >> since the out-of-quorum monitors will probably keep trying to form a >> quorum and that might make mon.5 unhappy. You should test what happens >> with that kind of net split. :) >> -Greg >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com