Re: MON quorum a single point of failure?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday, June 20, 2013, Bo wrote:
>
> Howdy!
>
> Loving working with ceph; learning a lot. :)
>
> I am curious about the quorum process because I seem to get conflicting information from "experts". Those that I report to need a clear answer from me which I am currently unable to give.
>
> Ceph needs an odd number of monitors in any given cluster (3, 5, 7) to avoid split-brain syndrome. So what happens whenever I have 3 monitors, 1 dies, and I have 2 left?
>
> The information regarding this situation that I have gathered over the past few months all falls within these three categories:
> A) commonly "stated"--nothing is said. period.
> B) rarely stated--this is a bad situation (possibly split-brain).
> C) rarely stated--each monitor has a "rank", so the highest ranking monitor is the boss, thus quorum.
>
> Does anyone know with absolute certainty what ceph's quorum logic will do with an even number of (specifically 2) monitors left?
>
> You may say, "well, take down one of your monitors", to which I respectfully state that my testing is not an authoritative answer on what ceph is designed to do and what it does in production. My testing cannot cover the vast majority of cases covered by the hundreds/thousands who have had a monitor die.
>
> Thank you for your time and brain juice,
> -bo


This is often misunderstood, but the answers to your questions are
pretty simple. :)

There is no risk of split brain in Ceph (so, not in the monitor either).
The mantra to use an odd number of monitors is *not* a system
requirement; it is a deployment recommendation. This is due to how the
cluster avoids split brain — using a Paxos variant in which a strict
majority of the monitors need to agree on everything. Using one
monitor, you can make forward progress if it's running; using two
monitors, you can afford for neither of them to die (because then you
only have 50% up); using three monitors you can lose one; using four
you can lose one; using five you can lose two; etc. So using an even
number of monitors increases your odds of failure without increasing
your survivability (in availability terms) of failure over the
previous odd number.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux