monitor not joining quorum

Denis Polom <denispolom@xxxxxxxxx> · Tue, 19 Oct 2021 18:03:28 +0200

Hi,

one of our monitor VM  was rebooted and not joining quorum again (quorum 
consist out of 3 monitors). While monitor service (ceph1) is running on 
this VM, Ceph cluster become unreachable. In monitor logs on ceph3 VM  I 
can see a lot of  following messages:

2021-10-19 17:50:19.555 7fe49e912700  0 log_channel(audit) log [DBG] : 
from='client.? 10.13.68.11:0/1846917599' entity='client.admin' 
cmd=[{"prefix": "osd blacklist ls"}]: dispatch
2021-10-19 17:50:20.255 7fe4a1117700  1 mon.ceph3@1(leader).paxos(paxos 
updating c 95374479..95375018) accept timeout, calling fresh election
2021-10-19 17:50:20.255 7fe49e912700  0 log_channel(cluster) log [INF] : 
mon.ceph3 calling monitor election
2021-10-19 17:50:20.255 7fe49e912700  1 
mon.ceph3@1(electing).elector(42748) init, last seen epoch 42748
2021-10-19 17:50:20.263 7fe49e912700 -1 mon.ceph3@1(electing) e4 failed 
to get devid for : fallback method has serial ''but no model
2021-10-19 17:50:21.491 7fe49b90c700  1 mon.ceph3@1(electing) e4 
handle_auth_request failed to assign global_id
2021-10-19 17:50:23.567 7fe49b90c700  1 mon.ceph3@1(electing) e4 
handle_auth_request failed to assign global_id
2021-10-19 17:50:23.771 7fe49b90c700  1 mon.ceph3@1(electing) e4 
handle_auth_request failed to assign global_id
2021-10-19 17:50:24.175 7fe49c90e700  1 mon.ceph3@1(electing) e4 
handle_auth_request failed to assign global_id
2021-10-19 17:50:24.979 7fe49c90e700  1 mon.ceph3@1(electing) e4 
handle_auth_request failed to assign global_id
2021-10-19 17:50:25.223 7fe49c90e700  1 mon.ceph3@1(electing) e4 
handle_auth_request failed to assign global_id
2021-10-19 17:50:25.263 7fe4a1117700  1 
mon.ceph3@1(electing).elector(42749) init, last seen epoch 42749, 
mid-election, bumping
2021-10-19 17:50:25.271 7fe49c90e700  1 mon.ceph3@1(electing) e4 
handle_auth_request failed to assign global_id
2021-10-19 17:50:25.279 7fe4a1117700 -1 mon.ceph3@1(electing) e4 failed 
to get devid for : fallback method has serial ''but no model
2021-10-19 17:50:25.487 7fe49c90e700  1 mon.ceph3@1(electing) e4 
handle_auth_request failed to assign global_id

NTP is running on all nodes on cluster and time is in correct sync.

Any help would be appreciated.

thx!

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx