I really like this proposal. On Mon, Apr 13, 2015 at 2:33 AM, Joao Eduardo Luis <joao@xxxxxxx> wrote: > On 04/13/2015 02:25 AM, Christian Balzer wrote: >> On Sun, 12 Apr 2015 14:37:56 -0700 Gregory Farnum wrote: >> >>> On Sun, Apr 12, 2015 at 1:58 PM, Francois Lafont <flafdivers@xxxxxxx> >>> wrote: >>>> Somnath Roy wrote: >>>> >>>>> Interesting scenario :-).. IMHO, I don't think cluster will be in >>>>> healthy state here if the connections between dc1 and dc2 is cut. The >>>>> reason is the following. >>>>> >>>>> 1. only osd.5 can talk to both data center OSDs and other 2 mons >>>>> will not be. So, they can't reach to an agreement (and form quorum) >>>>> about the state of OSDs. >>>>> >>>>> 2. OSDs on dc1 and dc2 will not be able to talk to each other, >>>>> considering replicas across data centers, the cluster will be broken. >>>> >>>> Yes, in fact, after thought, I have the first question below. >>>> >>>> If: (more clear with a schema is the head ;)) >>>> >>>> 1. mon.1 and mon.2 can talk together (in dc1) and can talk with >>>> mon.5 (via the VPN) but can't talk with mon.3 and mon.4 (in dc2) >>>> 2. mon.3 and mon.4 can talk together (in dc2) and can talk with >>>> mon.5 (via the VPN) but can't talk with mon.1 and mon.2 (in dc1) >>>> 3. mon.5 can talk with mon.1, mon.2, mon.3, mon.4 and mon.5 >>>> >>>> is the quorum reached? If yes, which is the quorum? >>> >>> Yes, you should get a quorum as mon.5 will vote for one datacenter or >>> the other. Which one it chooses will depend on which monitor has the >>> "lowest" IP address (I think, or maybe just the monitor IDs or >>> something? Anyway, it's a consistent ordering). >> >> Pet peeve alert. ^_- >> >> It's the lowest IP. > > To be more precise, it's the lowest IP:PORT combination: > > 10.0.1.2:6789 = rank 0 > 10.0.1.2:6790 = rank 1 > 10.0.1.3:6789 = rank 3 > > and so on. > >> Which is something that really needs to be documented (better) so that >> people can plan things accordingly and have the leader monitor wind up on >> the best suited hardware (in case not everything is being equal). >> >> Other than that, the sequence of how (initial?) mons are listed in >> ceph.conf would of course be the most natural, expected way to sort >> monitors. > > I don't agree. I find it hard to rely on ceph.conf for sensitive > decisions like this, because we must ensure that ceph.conf is the same > in all the nodes; and I've seen this not being the case more often than > not. > > On the other hand, I do agree that we should make it easier for people > to specify which monitors they want in the line of succession to the > leader, so that they can plan their clusters accordingly. I do believe > we can set this on the monmap, ideally once the first quorum is formed; > something like: > > ceph mon rank set mon.a 0 > ceph mon rank set mon.b 2 > ceph mon rank set mon.c 1 > > ceph mon rank list > > MON IP:PORT RANK POLICY STATUS > mon.a 10.0.1.2:6789 rank 0 [set-by-user] leader > mon.c 10.0.1.3:6789 rank 1 [set-by-user] peon > mon.b 10.0.1.2:6790 rank 2 [set-by-user] down > mon.d 10.0.1.4:6789 rank 3 [default] peon > > > Thoughts? > > -Joao > >> >> Christian >> >> >>> Under no circumstances >>> whatsoever will mon.5 help each datacenter create their own quorums at >>> the same time. The other data center will just be out of luck and >>> unable to do anything. >>> Although it's possible that the formed quorum won't be very stable >>> since the out-of-quorum monitors will probably keep trying to form a >>> quorum and that might make mon.5 unhappy. You should test what happens >>> with that kind of net split. :) >>> -Greg >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com