Really? First time I read this here, afaik you can get a split brain like this. -----Original Message----- Sent: Thursday, October 29, 2020 12:16 AM To: Eugen Block Cc: ceph-users Subject: Re: frequent Monitor down Eugen, I've got four physical servers and I've installed mon on all of them. I've discussed it with Wido and a few other chaps from ceph and there is no issue in doing it. The quorum issues would happen if you have 2 mons. If you've got more than 2 you should be fine. Andrei ----- Original Message ----- > From: "Eugen Block" <eblock@xxxxxx> > To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx> > Cc: "ceph-users" <ceph-users@xxxxxxx> > Sent: Wednesday, 28 October, 2020 20:19:15 > Subject: Re: Re: frequent Monitor down > Why do you have 4 MONs in the first place? That way a quorum is > difficult to achieve, could it be related to that? > > Zitat von Andrei Mikhailovsky <andrei@xxxxxxxxxx>: > >> Yes, I have, Eugen, I see no obvious reason / error / etc. I see a >> lot of entries relating to Compressing as well as monitor going down. >> >> Andrei >> >> >> >> ----- Original Message ----- >>> From: "Eugen Block" <eblock@xxxxxx> >>> To: "ceph-users" <ceph-users@xxxxxxx> >>> Sent: Wednesday, 28 October, 2020 11:51:20 >>> Subject: Re: frequent Monitor down >> >>> Have you looked into syslog and mon logs? >>> >>> >>> Zitat von Andrei Mikhailovsky <andrei@xxxxxxxxxx>: >>> >>>> Hello everyone, >>>> >>>> I am having regular messages that the Monitors are going down and up: >>>> >>>> 2020-10-27T09:50:49.032431+0000 mon .arh-ibstorage2-ib ( mon .1) >>>> 2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum >>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN) >>>> 2020-10-27T09:50:49.123511+0000 mon .arh-ibstorage2-ib ( mon .1) >>>> 2250 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing >>>> BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum >>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout >>>> flag(s) set; 43 pgs not deep-scrubbed in time; 12 pgs not scrubbed >>>> in time 2020-10-27T09:50:52.735457+0000 mon .arh-ibstorage1-ib ( >>>> mon .0) >>>> 31287 : cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons >>>> down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib) >>>> 2020-10-27T12:35:20.556458+0000 mon .arh-ibstorage2-ib ( mon .1) >>>> 2260 : cluster [WRN] Health check failed: 1/4 mons down, quorum >>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN) >>>> 2020-10-27T12:35:20.643282+0000 mon .arh-ibstorage2-ib ( mon .1) >>>> 2262 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing >>>> BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum >>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout >>>> flag(s) set; 47 pgs not deep-scrubbed in time; 14 pgs not scrubbed >>>> in time >>>> >>>> >>>> This happens on a daily basis several times a day. >>>> >>>> Could you please let me know how to fix this annoying problem? >>>> >>>> I am running ceph version 15.2.4 >>>> (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) on >>>> Ubuntu 18.04 LTS with latest updates. >>>> >>>> Thanks >>>> >>>> Andrei >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send >>>> an email to ceph-users-leave@xxxxxxx >>> >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx