Re: redundancy with 2 nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Thu, 01 Jan 2015 18:25:47 +1300 Mark Kirkwood wrote:

> The number of monitors recommended and the fact that a voting quorum is 
> the way it works is covered here:
> 
> http://ceph.com/docs/master/rados/deployment/ceph-deploy-mon/
> 
> but I agree that you should probably not get a HEALTH OK status when you 
> have just setup 2 (or in fact any even number of) monitors...HEALTH WARN 
> would make more sense, with a wee message suggesting adding at least one 
> more!
> 

I think what Jiri meant is that wen the whole cluster goes into a deadlock
due to loosing monitor quorum, ceph -s etc won't work anymore either.

And while the cluster rightfully shouldn't be doing anything in such a
state, querying the surviving/reachable monitor and being told as much
would indeed be a nice feature, as opposed to deafening silence.

As for your suggestion, while certainly helpful it is my not so humble
opinion than the the WARN state right now is totally overloaded and quite
frankly bogus.
This is particularly a problem with monitor plugins that just pick up the
WARN state without further discrimination. 

And some WARN states like slow requests are pretty much an ERR state for
most people, stalled requests for more than 30 seconds (or days!) are a
sign of something massively wrong and likely to have customer/client
impact.

I think a neat solution would be the ability to assign all possible
problem states a value like ERR, WARN, NOTE.

A cluster with just 1 or 2 monitors or having scrub disabled is (for me)
worth a NOTE, but not a WARN.

Christian

> Regards
> 
> Mark
> 
> 
> On 01/01/15 18:06, Jiri Kanicky wrote:
> > Hi,
> >
> > I think you are right. I was too focused on the following line in docs:
> > "A cluster will run fine with a single monitor; however,*a single
> > monitor is a single-point-of-failure*." I will try to add another
> > monitor. Hopefully, this will fix my issue.
> >
> > Anyway, I think that "ceph status" or "ceph health" should report at
> > least something in such state. Its quite weird that everything stops...
> >
> > Thank you
> > Jiri
> >
> > On 1/01/2015 15:51, Lindsay Mathieson wrote:
> >> On Thu, 1 Jan 2015 03:46:33 PM Jiri Kanicky wrote:
> >>> Hi,
> >>>
> >>> I have:
> >>> - 2 monitors, one on each node
> >>> - 4 OSDs, two on each node
> >>> - 2 MDS, one on each node
> >> POOMA U here, but I don't think you can reach quorum with one out of
> >> two monitors, you need a odd number:
> >>
> >> http://ceph.com/docs/master/rados/configuration/mon-config-ref/#monitor-quorum
> >>
> >> Perhaps try removing one monitor, so you only have one left, then
> >> take the node without a monitor down.
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux