Re: frequent Monitor down

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Thu, 29 Oct 2020 09:41:49 +0100

Really? First time I read this here, afaik you can get a split brain 
like this.

-----Original Message-----
Sent: Thursday, October 29, 2020 12:16 AM
To: Eugen Block
Cc: ceph-users
Subject:  Re: frequent Monitor down

Eugen, I've got four physical servers and I've installed mon on all of 
them. I've discussed it with Wido and a few other chaps from ceph and 
there is no issue in doing it. The quorum issues would happen if you 
have 2 mons. If you've got more than 2 you should be fine.

Andrei

----- Original Message -----
> From: "Eugen Block" <eblock@xxxxxx>
> To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
> Cc: "ceph-users" <ceph-users@xxxxxxx>
> Sent: Wednesday, 28 October, 2020 20:19:15
> Subject: Re:  Re: frequent Monitor down

> Why do you have 4 MONs in the first place? That way a quorum is 
> difficult to achieve, could it be related to that?
> 
> Zitat von Andrei Mikhailovsky <andrei@xxxxxxxxxx>:
> 
>> Yes, I have, Eugen, I see no obvious reason / error / etc. I see a 
>> lot of entries relating to Compressing as well as monitor going down.
>>
>> Andrei
>>
>>
>>
>> ----- Original Message -----
>>> From: "Eugen Block" <eblock@xxxxxx>
>>> To: "ceph-users" <ceph-users@xxxxxxx>
>>> Sent: Wednesday, 28 October, 2020 11:51:20
>>> Subject:  Re: frequent Monitor down
>>
>>> Have you looked into syslog and mon logs?
>>>
>>>
>>> Zitat von Andrei Mikhailovsky <andrei@xxxxxxxxxx>:
>>>
>>>> Hello everyone,
>>>>
>>>> I am having regular messages that the Monitors are going down and 
up:
>>>>
>>>> 2020-10-27T09:50:49.032431+0000 mon .arh-ibstorage2-ib ( mon .1)
>>>> 2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum 
>>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN) 
>>>> 2020-10-27T09:50:49.123511+0000 mon .arh-ibstorage2-ib ( mon .1) 
>>>> 2250 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing 
>>>> BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum 
>>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout 
>>>> flag(s) set; 43 pgs not deep-scrubbed in time; 12 pgs not scrubbed 
>>>> in time 2020-10-27T09:50:52.735457+0000 mon .arh-ibstorage1-ib ( 
>>>> mon .0)
>>>> 31287 : cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons 

>>>> down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib)
>>>> 2020-10-27T12:35:20.556458+0000 mon .arh-ibstorage2-ib ( mon .1) 
>>>> 2260 : cluster [WRN] Health check failed: 1/4 mons down, quorum 
>>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN) 
>>>> 2020-10-27T12:35:20.643282+0000 mon .arh-ibstorage2-ib ( mon .1)
>>>> 2262 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing 
>>>> BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum 
>>>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout 
>>>> flag(s) set; 47 pgs not deep-scrubbed in time; 14 pgs not scrubbed 
>>>> in time
>>>>
>>>>
>>>> This happens on a daily basis several times a day.
>>>>
>>>> Could you please let me know how to fix this annoying problem?
>>>>
>>>> I am running ceph version 15.2.4
>>>> (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) on 
>>>> Ubuntu 18.04 LTS with latest updates.
>>>>
>>>> Thanks
>>>>
>>>> Andrei
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send 
>>>> an email to ceph-users-leave@xxxxxxx
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx