Re: How to dispatch monitors in a multi-site cluster (ie in 2 datacenters)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I really like this proposal.

On Mon, Apr 13, 2015 at 2:33 AM, Joao Eduardo Luis <joao@xxxxxxx> wrote:
> On 04/13/2015 02:25 AM, Christian Balzer wrote:
>> On Sun, 12 Apr 2015 14:37:56 -0700 Gregory Farnum wrote:
>>
>>> On Sun, Apr 12, 2015 at 1:58 PM, Francois Lafont <flafdivers@xxxxxxx>
>>> wrote:
>>>> Somnath Roy wrote:
>>>>
>>>>> Interesting scenario :-).. IMHO, I don't think cluster will be in
>>>>> healthy state here if the connections between dc1 and dc2 is cut. The
>>>>> reason is the following.
>>>>>
>>>>> 1. only osd.5 can talk to both data center  OSDs and other 2 mons
>>>>> will not be. So, they can't reach to an agreement (and form quorum)
>>>>> about the state of OSDs.
>>>>>
>>>>> 2. OSDs on dc1 and dc2 will not be able to talk to each other,
>>>>> considering replicas across data centers, the cluster will be broken.
>>>>
>>>> Yes, in fact, after thought, I have the first question below.
>>>>
>>>> If: (more clear with a schema is the head ;))
>>>>
>>>>     1. mon.1 and mon.2 can talk together (in dc1) and can talk with
>>>> mon.5 (via the VPN) but can't talk with mon.3 and mon.4 (in dc2)
>>>>     2. mon.3 and mon.4 can talk together (in dc2) and can talk with
>>>> mon.5 (via the VPN) but can't talk with mon.1 and mon.2 (in dc1)
>>>>     3. mon.5 can talk with mon.1, mon.2, mon.3, mon.4 and mon.5
>>>>
>>>> is the quorum reached? If yes, which is the quorum?
>>>
>>> Yes, you should get a quorum as mon.5 will vote for one datacenter or
>>> the other. Which one it chooses will depend on which monitor has the
>>> "lowest" IP address (I think, or maybe just the monitor IDs or
>>> something? Anyway, it's a consistent ordering).
>>
>> Pet peeve alert. ^_-
>>
>> It's the lowest IP.
>
> To be more precise, it's the lowest IP:PORT combination:
>
> 10.0.1.2:6789 = rank 0
> 10.0.1.2:6790 = rank 1
> 10.0.1.3:6789 = rank 3
>
> and so on.
>
>> Which is something that really needs to be documented (better) so that
>> people can plan things accordingly and have the leader monitor wind up on
>> the best suited hardware (in case not everything is being equal).
>>
>> Other than that, the sequence of how (initial?) mons are listed in
>> ceph.conf would of course be the most natural, expected way to sort
>> monitors.
>
> I don't agree.  I find it hard to rely on ceph.conf for sensitive
> decisions like this, because we must ensure that ceph.conf is the same
> in all the nodes;  and I've seen this not being the case more often than
> not.
>
> On the other hand, I do agree that we should make it easier for people
> to specify which monitors they want in the line of succession to the
> leader, so that they can plan their clusters accordingly.  I do believe
> we can set this on the monmap, ideally once the first quorum is formed;
> something like:
>
> ceph mon rank set mon.a 0
> ceph mon rank set mon.b 2
> ceph mon rank set mon.c 1
>
> ceph mon rank list
>
>   MON   IP:PORT       RANK     POLICY        STATUS
>   mon.a 10.0.1.2:6789 rank 0  [set-by-user]  leader
>   mon.c 10.0.1.3:6789 rank 1  [set-by-user]  peon
>   mon.b 10.0.1.2:6790 rank 2  [set-by-user]  down
>   mon.d 10.0.1.4:6789 rank 3  [default]      peon
>
>
> Thoughts?
>
>   -Joao
>
>>
>> Christian
>>
>>
>>> Under no circumstances
>>> whatsoever will mon.5 help each datacenter create their own quorums at
>>> the same time. The other data center will just be out of luck and
>>> unable to do anything.
>>> Although it's possible that the formed quorum won't be very stable
>>> since the out-of-quorum monitors will probably keep trying to form a
>>> quorum and that might make mon.5 unhappy. You should test what happens
>>> with that kind of net split. :)
>>> -Greg
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux