Hi Denis, Which ceph version is your cluster running on? I know there was an issue with mons getting dropped from the monmap (and therefore being stuck out of quorum) when their host was rebooted in Pacific version prior to 16.2.6 https://tracker.ceph.com/issues/51027. If you're on a Pacific version older than 16.2.6 it could be that same issue and workarounds are discussed in the tracker. Even if you are on 16.2.6 the workarounds in that tracker could still be helpful. On Tue, Oct 19, 2021 at 12:07 PM Denis Polom <denispolom@xxxxxxxxx> wrote: > Hi, > > one of our monitor VM was rebooted and not joining quorum again (quorum > consist out of 3 monitors). While monitor service (ceph1) is running on > this VM, Ceph cluster become unreachable. In monitor logs on ceph3 VM I > can see a lot of following messages: > > > 2021-10-19 17:50:19.555 7fe49e912700 0 log_channel(audit) log [DBG] : > from='client.? 10.13.68.11:0/1846917599' entity='client.admin' > cmd=[{"prefix": "osd blacklist ls"}]: dispatch > 2021-10-19 17:50:20.255 7fe4a1117700 1 mon.ceph3@1(leader).paxos(paxos > updating c 95374479..95375018) accept timeout, calling fresh election > 2021-10-19 17:50:20.255 7fe49e912700 0 log_channel(cluster) log [INF] : > mon.ceph3 calling monitor election > 2021-10-19 17:50:20.255 7fe49e912700 1 > mon.ceph3@1(electing).elector(42748) init, last seen epoch 42748 > 2021-10-19 17:50:20.263 7fe49e912700 -1 mon.ceph3@1(electing) e4 failed > to get devid for : fallback method has serial ''but no model > 2021-10-19 17:50:21.491 7fe49b90c700 1 mon.ceph3@1(electing) e4 > handle_auth_request failed to assign global_id > 2021-10-19 17:50:23.567 7fe49b90c700 1 mon.ceph3@1(electing) e4 > handle_auth_request failed to assign global_id > 2021-10-19 17:50:23.771 7fe49b90c700 1 mon.ceph3@1(electing) e4 > handle_auth_request failed to assign global_id > 2021-10-19 17:50:24.175 7fe49c90e700 1 mon.ceph3@1(electing) e4 > handle_auth_request failed to assign global_id > 2021-10-19 17:50:24.979 7fe49c90e700 1 mon.ceph3@1(electing) e4 > handle_auth_request failed to assign global_id > 2021-10-19 17:50:25.223 7fe49c90e700 1 mon.ceph3@1(electing) e4 > handle_auth_request failed to assign global_id > 2021-10-19 17:50:25.263 7fe4a1117700 1 > mon.ceph3@1(electing).elector(42749) init, last seen epoch 42749, > mid-election, bumping > 2021-10-19 17:50:25.271 7fe49c90e700 1 mon.ceph3@1(electing) e4 > handle_auth_request failed to assign global_id > 2021-10-19 17:50:25.279 7fe4a1117700 -1 mon.ceph3@1(electing) e4 failed > to get devid for : fallback method has serial ''but no model > 2021-10-19 17:50:25.487 7fe49c90e700 1 mon.ceph3@1(electing) e4 > handle_auth_request failed to assign global_id > > > NTP is running on all nodes on cluster and time is in correct sync. > > Any help would be appreciated. > > thx! > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx