Hello, On Thu, 01 Jan 2015 18:25:47 +1300 Mark Kirkwood wrote: > The number of monitors recommended and the fact that a voting quorum is > the way it works is covered here: > > http://ceph.com/docs/master/rados/deployment/ceph-deploy-mon/ > > but I agree that you should probably not get a HEALTH OK status when you > have just setup 2 (or in fact any even number of) monitors...HEALTH WARN > would make more sense, with a wee message suggesting adding at least one > more! > I think what Jiri meant is that wen the whole cluster goes into a deadlock due to loosing monitor quorum, ceph -s etc won't work anymore either. And while the cluster rightfully shouldn't be doing anything in such a state, querying the surviving/reachable monitor and being told as much would indeed be a nice feature, as opposed to deafening silence. As for your suggestion, while certainly helpful it is my not so humble opinion than the the WARN state right now is totally overloaded and quite frankly bogus. This is particularly a problem with monitor plugins that just pick up the WARN state without further discrimination. And some WARN states like slow requests are pretty much an ERR state for most people, stalled requests for more than 30 seconds (or days!) are a sign of something massively wrong and likely to have customer/client impact. I think a neat solution would be the ability to assign all possible problem states a value like ERR, WARN, NOTE. A cluster with just 1 or 2 monitors or having scrub disabled is (for me) worth a NOTE, but not a WARN. Christian > Regards > > Mark > > > On 01/01/15 18:06, Jiri Kanicky wrote: > > Hi, > > > > I think you are right. I was too focused on the following line in docs: > > "A cluster will run fine with a single monitor; however,*a single > > monitor is a single-point-of-failure*." I will try to add another > > monitor. Hopefully, this will fix my issue. > > > > Anyway, I think that "ceph status" or "ceph health" should report at > > least something in such state. Its quite weird that everything stops... > > > > Thank you > > Jiri > > > > On 1/01/2015 15:51, Lindsay Mathieson wrote: > >> On Thu, 1 Jan 2015 03:46:33 PM Jiri Kanicky wrote: > >>> Hi, > >>> > >>> I have: > >>> - 2 monitors, one on each node > >>> - 4 OSDs, two on each node > >>> - 2 MDS, one on each node > >> POOMA U here, but I don't think you can reach quorum with one out of > >> two monitors, you need a odd number: > >> > >> http://ceph.com/docs/master/rados/configuration/mon-config-ref/#monitor-quorum > >> > >> Perhaps try removing one monitor, so you only have one left, then > >> take the node without a monitor down. > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com