Re: Feature request: "max mon" setting

Amon Ott <a.ott@xxxxxxxxxxxx> · Fri, 4 Nov 2011 09:12:13 +0100

On Thursday 03 November 2011 you wrote:
> On Thu, Nov 3, 2011 at 05:02, Amon Ott <a.ott@xxxxxxxxxxxx> wrote:
> > Documentation recommends three monitors. In our special cluster
> > configuration, this would mean that if accidentially two nodes with
> > monitors fail (e.g. one in maintenance and one crashes), the whole
> > cluster dies. What I would really
>
> If you feel two monitors going down is too likely, run a monitor
> cluster of size 5. And if you feel 3 monitors going down is too
> likely, run 7.

So there is no problem in having much more than three, as long as it is an odd 
number.

> > like would be that I can define a monitor on each node and e.g. set "max
> > mon = 3". Each monitor starting up can then check how many monitors are
> > already up and go to standby, if the number has already been reached.
> > Regular rechecking could allow another monitor to become active, if one
> > of the previously active monitors has died. Just like "max mds" actually.
>
> Unfortunately, that is fundamentally not wanted. That would let a so
> called "split brain" situation occur, and the whole purpose of the
> majority rule of monitors is to ensure that it does not happen. If we
> didn't care about that, the monitors left standing would never need to
> stop operating.

Alright, I see you point. A network split could be unfortunate, but it could 
also make problems with e.g. 7 monitors - 4 on one and 3 on the other 
partition. So best would be to really stick with 3.

> > A special case that gives me most headaches is the case of just two
> > active nodes. According to documentation, the monitor problem means that
> > one failing monitor kills the cluster whatever the number of defined
> > monitors (1 or 2), even if we have all data safely placed on both nodes.
>
> Yes, 1 or 2 total physical nodes in a Ceph cluster makes it hard to do
> HA. You could run just one ceph-mon, then the other node failing
> doesn't affect the cluster at all (but naturally the one running
> ceph-mon must not fail).
>
> Perhaps you can get a third machine to also be a monitor, even if it
> doesn't participate in storage etc. ceph-mon is a very lightweight
> process. Share a server that has other responsibilities, or buy the
> cheapest atom netbook you can find -- it should do the job just fine.

Yes, this sounds like a useful idea.

> Clusters of <=2 nodes are pretty far from what Ceph was designed for,
> and while we use them all the time for testing, running a setup that
> small for real is pretty rare. Sorry.

I agree that our needs are special. We want to be able to start with 2 and 
extend as customer needs grow, or start with 20 and shut some of them down if 
load gets low. I now understand that we should start with at least three 
nodes.

Thank you for clearing things up!

Amon Ott
-- 
Dr. Amon Ott
m-privacy GmbH           Tel: +49 30 24342334
Am Köllnischen Park 1    Fax: +49 30 24342336
10179 Berlin             http://www.m-privacy.de

Amtsgericht Charlottenburg, HRB 84946

Geschäftsführer:
 Dipl.-Kfm. Holger Maczkowsky,
 Roman Maczkowsky

GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html