Re: frequent Monitor down

Tony Liu <tonyliu0592@xxxxxxxxxxx> · Thu, 29 Oct 2020 19:16:00 +0000

Typically, the number of nodes is 2n+1 to cover n failures.
It's OK to have 4 nodes, from failure covering POV, it's the same
as 3 nodes. 4 nodes will cover 1 failure. If 2 nodes down, the
cluster is down. It works, just not make much sense.

Thanks!
Tony
> -----Original Message-----
> From: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
> Sent: Thursday, October 29, 2020 1:42 AM
> To: andrei <andrei@xxxxxxxxxx>; eblock <eblock@xxxxxx>
> Cc: ceph-users <ceph-users@xxxxxxx>
> Subject:  Re: frequent Monitor down
> 
> Really? First time I read this here, afaik you can get a split brain
> like this.
> 
> 
> 
> -----Original Message-----
> Sent: Thursday, October 29, 2020 12:16 AM
> To: Eugen Block
> Cc: ceph-users
> Subject:  Re: frequent Monitor down
> 
> Eugen, I've got four physical servers and I've installed mon on all of
> them. I've discussed it with Wido and a few other chaps from ceph and
> there is no issue in doing it. The quorum issues would happen if you
> have 2 mons. If you've got more than 2 you should be fine.
> 
> Andrei
> 
> ----- Original Message -----
> > From: "Eugen Block" <eblock@xxxxxx>
> > To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
> > Cc: "ceph-users" <ceph-users@xxxxxxx>
> > Sent: Wednesday, 28 October, 2020 20:19:15
> > Subject: Re:  Re: frequent Monitor down
> 
> > Why do you have 4 MONs in the first place? That way a quorum is
> > difficult to achieve, could it be related to that?
> >
> > Zitat von Andrei Mikhailovsky <andrei@xxxxxxxxxx>:
> >
> >> Yes, I have, Eugen, I see no obvious reason / error / etc. I see a
> >> lot of entries relating to Compressing as well as monitor going down.
> >>
> >> Andrei
> >>
> >>
> >>
> >> ----- Original Message -----
> >>> From: "Eugen Block" <eblock@xxxxxx>
> >>> To: "ceph-users" <ceph-users@xxxxxxx>
> >>> Sent: Wednesday, 28 October, 2020 11:51:20
> >>> Subject:  Re: frequent Monitor down
> >>
> >>> Have you looked into syslog and mon logs?
> >>>
> >>>
> >>> Zitat von Andrei Mikhailovsky <andrei@xxxxxxxxxx>:
> >>>
> >>>> Hello everyone,
> >>>>
> >>>> I am having regular messages that the Monitors are going down and
> up:
> >>>>
> >>>> 2020-10-27T09:50:49.032431+0000 mon .arh-ibstorage2-ib ( mon .1)
> >>>> 2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum
> >>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
> >>>> 2020-10-27T09:50:49.123511+0000 mon .arh-ibstorage2-ib ( mon .1)
> >>>> 2250 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing
> >>>> BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum
> >>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout
> >>>> flag(s) set; 43 pgs not deep-scrubbed in time; 12 pgs not scrubbed
> >>>> in time 2020-10-27T09:50:52.735457+0000 mon .arh-ibstorage1-ib (
> >>>> mon .0)
> >>>> 31287 : cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons
> 
> >>>> down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib)
> >>>> 2020-10-27T12:35:20.556458+0000 mon .arh-ibstorage2-ib ( mon .1)
> >>>> 2260 : cluster [WRN] Health check failed: 1/4 mons down, quorum
> >>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
> >>>> 2020-10-27T12:35:20.643282+0000 mon .arh-ibstorage2-ib ( mon .1)
> >>>> 2262 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing
> >>>> BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum
> >>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout
> >>>> flag(s) set; 47 pgs not deep-scrubbed in time; 14 pgs not scrubbed
> >>>> in time
> >>>>
> >>>>
> >>>> This happens on a daily basis several times a day.
> >>>>
> >>>> Could you please let me know how to fix this annoying problem?
> >>>>
> >>>> I am running ceph version 15.2.4
> >>>> (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) on
> >>>> Ubuntu 18.04 LTS with latest updates.
> >>>>
> >>>> Thanks
> >>>>
> >>>> Andrei
> >>>> _______________________________________________
> >>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send
> >>>> an email to ceph-users-leave@xxxxxxx
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx