On Thursday 03 November 2011 you wrote: > On Thu, Nov 3, 2011 at 05:02, Amon Ott <a.ott@xxxxxxxxxxxx> wrote: > > Documentation recommends three monitors. In our special cluster > > configuration, this would mean that if accidentially two nodes with > > monitors fail (e.g. one in maintenance and one crashes), the whole > > cluster dies. What I would really > > If you feel two monitors going down is too likely, run a monitor > cluster of size 5. And if you feel 3 monitors going down is too > likely, run 7. So there is no problem in having much more than three, as long as it is an odd number. > > like would be that I can define a monitor on each node and e.g. set "max > > mon = 3". Each monitor starting up can then check how many monitors are > > already up and go to standby, if the number has already been reached. > > Regular rechecking could allow another monitor to become active, if one > > of the previously active monitors has died. Just like "max mds" actually. > > Unfortunately, that is fundamentally not wanted. That would let a so > called "split brain" situation occur, and the whole purpose of the > majority rule of monitors is to ensure that it does not happen. If we > didn't care about that, the monitors left standing would never need to > stop operating. Alright, I see you point. A network split could be unfortunate, but it could also make problems with e.g. 7 monitors - 4 on one and 3 on the other partition. So best would be to really stick with 3. > > A special case that gives me most headaches is the case of just two > > active nodes. According to documentation, the monitor problem means that > > one failing monitor kills the cluster whatever the number of defined > > monitors (1 or 2), even if we have all data safely placed on both nodes. > > Yes, 1 or 2 total physical nodes in a Ceph cluster makes it hard to do > HA. You could run just one ceph-mon, then the other node failing > doesn't affect the cluster at all (but naturally the one running > ceph-mon must not fail). > > Perhaps you can get a third machine to also be a monitor, even if it > doesn't participate in storage etc. ceph-mon is a very lightweight > process. Share a server that has other responsibilities, or buy the > cheapest atom netbook you can find -- it should do the job just fine. Yes, this sounds like a useful idea. > Clusters of <=2 nodes are pretty far from what Ceph was designed for, > and while we use them all the time for testing, running a setup that > small for real is pretty rare. Sorry. I agree that our needs are special. We want to be able to start with 2 and extend as customer needs grow, or start with 20 and shut some of them down if load gets low. I now understand that we should start with at least three nodes. Thank you for clearing things up! Amon Ott -- Dr. Amon Ott m-privacy GmbH Tel: +49 30 24342334 Am Köllnischen Park 1 Fax: +49 30 24342336 10179 Berlin http://www.m-privacy.de Amtsgericht Charlottenburg, HRB 84946 Geschäftsführer: Dipl.-Kfm. Holger Maczkowsky, Roman Maczkowsky GnuPG-Key-ID: 0x2DD3A649 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html