Hi all, I think I can reduce the defcon level a bit. Since I couldn't see something in the mon log, I started to try if its a specific mon that causes trouble by shutting one by one down for a while. I got lucky at the first try. Shutting down the leader stopped the voting from happening. I left it down for a while and rebooted the server. Then I started the mon again and there has still not been a new election. It looks like the reboot finally cleared out the problem. This indicates that it might be a problem with the hardware, although the coincidence with the MDS restart is striking and I doubt that its just coincidence. Unfortunately, I can't find anything in the logs or health monitoring. Also an fsck on the mon store gave nothing. Since this is a recurring issue, it would be great if someone could take a look at the paste https://pastebin.com/hGPvVkuR if there is a clue. Thanks a lot for your help! ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: Thursday, May 4, 2023 1:01 PM To: Gregory Farnum; Dan van der Ster Cc: ceph-users@xxxxxxx Subject: Re: Frequent calling monitor election Hi all, I have to get back to this case. On Monday I had to restart an MDS to get rid of a stuck client caps recall. Right after that fail-over, the MONs went into a voting frenzy again. I already restarted all of them like last time, but this time this doesn't help. I might be in a different case here. In an effort to collect debug info, I set debug_mon on the leader to 10/10 and its producing voluminous output. Unfortunately, while debug_mon=10/10, the voting frenzy is not happening. It seems that I'm a bit in the situation described with "Tip: When debug output slows down your system, the latency can hide race conditions." at https://docs.ceph.com/en/octopus/rados/troubleshooting/log-and-debug/. The election frequency is significantly lower when debug_mon=10/10. I managed to catch one though and pasted the 20s before the election happened here: https://pastebin.com/hGPvVkuR . I hope there is a clue, I can't see anything that sticks out. Is there anything else I can look for? Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: Thursday, February 9, 2023 5:29 PM To: Gregory Farnum; Dan van der Ster Cc: ceph-users@xxxxxxx Subject: Re: Frequent calling monitor election Hi Dan and Gregory, thanks! These are good pointers. Will look into that tomorrow. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Gregory Farnum <gfarnum@xxxxxxxxxx> Sent: 09 February 2023 17:12:23 To: Dan van der Ster Cc: Frank Schilder; ceph-users@xxxxxxx Subject: Re: Re: Frequent calling monitor election Also, that the current leader (ceph-01) is one of the monitors proposing an election each time suggests the problem is with getting commit acks back from one of its followers. On Thu, Feb 9, 2023 at 8:09 AM Dan van der Ster <dvanders@xxxxxxxxx> wrote: > > Hi Frank, > > Check the mon logs with some increased debug levels to find out what > the leader is busy with. > We have a similar issue (though, daily) and it turned out to be > related to the mon leader timing out doing a SMART check. > See https://tracker.ceph.com/issues/54313 for how I debugged that. > > Cheers, Dan > > On Thu, Feb 9, 2023 at 7:56 AM Frank Schilder <frans@xxxxxx> wrote: > > > > Hi all, > > > > our monitors have enjoyed democracy since the beginning. However, I don't share a sudden excitement about voting: > > > > 2/9/23 4:42:30 PM[INF]overall HEALTH_OK > > 2/9/23 4:42:30 PM[INF]mon.ceph-01 is new leader, mons ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum (ranks 0,1,2,3,4) > > 2/9/23 4:42:26 PM[INF]mon.ceph-01 calling monitor election > > 2/9/23 4:42:26 PM[INF]mon.ceph-26 calling monitor election > > 2/9/23 4:42:26 PM[INF]mon.ceph-25 calling monitor election > > 2/9/23 4:42:26 PM[INF]mon.ceph-02 calling monitor election > > 2/9/23 4:40:00 PM[INF]overall HEALTH_OK > > 2/9/23 4:30:00 PM[INF]overall HEALTH_OK > > 2/9/23 4:24:34 PM[INF]overall HEALTH_OK > > 2/9/23 4:24:34 PM[INF]mon.ceph-01 is new leader, mons ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum (ranks 0,1,2,3,4) > > 2/9/23 4:24:29 PM[INF]mon.ceph-01 calling monitor election > > 2/9/23 4:24:29 PM[INF]mon.ceph-02 calling monitor election > > 2/9/23 4:24:29 PM[INF]mon.ceph-03 calling monitor election > > 2/9/23 4:24:29 PM[INF]mon.ceph-01 calling monitor election > > 2/9/23 4:24:29 PM[INF]mon.ceph-26 calling monitor election > > 2/9/23 4:24:29 PM[INF]mon.ceph-25 calling monitor election > > 2/9/23 4:24:29 PM[INF]mon.ceph-02 calling monitor election > > 2/9/23 4:24:04 PM[INF]overall HEALTH_OK > > 2/9/23 4:24:03 PM[INF]mon.ceph-01 is new leader, mons ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum (ranks 0,1,2,3,4) > > 2/9/23 4:23:59 PM[INF]mon.ceph-01 calling monitor election > > 2/9/23 4:23:59 PM[INF]mon.ceph-02 calling monitor election > > 2/9/23 4:20:00 PM[INF]overall HEALTH_OK > > 2/9/23 4:10:00 PM[INF]overall HEALTH_OK > > 2/9/23 4:00:00 PM[INF]overall HEALTH_OK > > 2/9/23 3:50:00 PM[INF]overall HEALTH_OK > > 2/9/23 3:43:13 PM[INF]overall HEALTH_OK > > 2/9/23 3:43:13 PM[INF]mon.ceph-01 is new leader, mons ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum (ranks 0,1,2,3,4) > > 2/9/23 3:43:08 PM[INF]mon.ceph-01 calling monitor election > > 2/9/23 3:43:08 PM[INF]mon.ceph-26 calling monitor election > > 2/9/23 3:43:08 PM[INF]mon.ceph-25 calling monitor election > > > > We moved a switch from one rack to another and after the switch came beck up, the monitors frequently bitch about who is the alpha. How do I get them to focus more on their daily duties again? > > > > Thanks for any help! > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx