Re: How to dispatch monitors in a multi-site cluster (ie in 2 datacenters)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/13/2015 02:25 AM, Christian Balzer wrote:
> On Sun, 12 Apr 2015 14:37:56 -0700 Gregory Farnum wrote:
> 
>> On Sun, Apr 12, 2015 at 1:58 PM, Francois Lafont <flafdivers@xxxxxxx>
>> wrote:
>>> Somnath Roy wrote:
>>>
>>>> Interesting scenario :-).. IMHO, I don't think cluster will be in
>>>> healthy state here if the connections between dc1 and dc2 is cut. The
>>>> reason is the following.
>>>>
>>>> 1. only osd.5 can talk to both data center  OSDs and other 2 mons
>>>> will not be. So, they can't reach to an agreement (and form quorum)
>>>> about the state of OSDs.
>>>>
>>>> 2. OSDs on dc1 and dc2 will not be able to talk to each other,
>>>> considering replicas across data centers, the cluster will be broken.
>>>
>>> Yes, in fact, after thought, I have the first question below.
>>>
>>> If: (more clear with a schema is the head ;))
>>>
>>>     1. mon.1 and mon.2 can talk together (in dc1) and can talk with
>>> mon.5 (via the VPN) but can't talk with mon.3 and mon.4 (in dc2)
>>>     2. mon.3 and mon.4 can talk together (in dc2) and can talk with
>>> mon.5 (via the VPN) but can't talk with mon.1 and mon.2 (in dc1)
>>>     3. mon.5 can talk with mon.1, mon.2, mon.3, mon.4 and mon.5
>>>
>>> is the quorum reached? If yes, which is the quorum?
>>
>> Yes, you should get a quorum as mon.5 will vote for one datacenter or
>> the other. Which one it chooses will depend on which monitor has the
>> "lowest" IP address (I think, or maybe just the monitor IDs or
>> something? Anyway, it's a consistent ordering). 
> 
> Pet peeve alert. ^_-
> 
> It's the lowest IP.

To be more precise, it's the lowest IP:PORT combination:

10.0.1.2:6789 = rank 0
10.0.1.2:6790 = rank 1
10.0.1.3:6789 = rank 3

and so on.

> Which is something that really needs to be documented (better) so that
> people can plan things accordingly and have the leader monitor wind up on
> the best suited hardware (in case not everything is being equal).
> 
> Other than that, the sequence of how (initial?) mons are listed in
> ceph.conf would of course be the most natural, expected way to sort
> monitors.

I don't agree.  I find it hard to rely on ceph.conf for sensitive
decisions like this, because we must ensure that ceph.conf is the same
in all the nodes;  and I've seen this not being the case more often than
not.

On the other hand, I do agree that we should make it easier for people
to specify which monitors they want in the line of succession to the
leader, so that they can plan their clusters accordingly.  I do believe
we can set this on the monmap, ideally once the first quorum is formed;
something like:

ceph mon rank set mon.a 0
ceph mon rank set mon.b 2
ceph mon rank set mon.c 1

ceph mon rank list

  MON   IP:PORT       RANK     POLICY        STATUS
  mon.a 10.0.1.2:6789 rank 0  [set-by-user]  leader
  mon.c 10.0.1.3:6789 rank 1  [set-by-user]  peon
  mon.b 10.0.1.2:6790 rank 2  [set-by-user]  down
  mon.d 10.0.1.4:6789 rank 3  [default]      peon


Thoughts?

  -Joao

> 
> Christian
> 
> 
>> Under no circumstances
>> whatsoever will mon.5 help each datacenter create their own quorums at
>> the same time. The other data center will just be out of luck and
>> unable to do anything.
>> Although it's possible that the formed quorum won't be very stable
>> since the out-of-quorum monitors will probably keep trying to form a
>> quorum and that might make mon.5 unhappy. You should test what happens
>> with that kind of net split. :)
>> -Greg
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux