Re: monitor not joining quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Denis,

Which ceph version is your cluster running on? I know there was an issue
with mons getting dropped from the monmap (and therefore being stuck out of
quorum) when their host was rebooted in Pacific version prior to 16.2.6
https://tracker.ceph.com/issues/51027. If you're on a Pacific version older
than 16.2.6 it could be that same issue and workarounds are discussed in
the tracker. Even if you are on 16.2.6 the workarounds in that tracker
could still be helpful.

On Tue, Oct 19, 2021 at 12:07 PM Denis Polom <denispolom@xxxxxxxxx> wrote:

> Hi,
>
> one of our monitor VM  was rebooted and not joining quorum again (quorum
> consist out of 3 monitors). While monitor service (ceph1) is running on
> this VM, Ceph cluster become unreachable. In monitor logs on ceph3 VM  I
> can see a lot of  following messages:
>
>
> 2021-10-19 17:50:19.555 7fe49e912700  0 log_channel(audit) log [DBG] :
> from='client.? 10.13.68.11:0/1846917599' entity='client.admin'
> cmd=[{"prefix": "osd blacklist ls"}]: dispatch
> 2021-10-19 17:50:20.255 7fe4a1117700  1 mon.ceph3@1(leader).paxos(paxos
> updating c 95374479..95375018) accept timeout, calling fresh election
> 2021-10-19 17:50:20.255 7fe49e912700  0 log_channel(cluster) log [INF] :
> mon.ceph3 calling monitor election
> 2021-10-19 17:50:20.255 7fe49e912700  1
> mon.ceph3@1(electing).elector(42748) init, last seen epoch 42748
> 2021-10-19 17:50:20.263 7fe49e912700 -1 mon.ceph3@1(electing) e4 failed
> to get devid for : fallback method has serial ''but no model
> 2021-10-19 17:50:21.491 7fe49b90c700  1 mon.ceph3@1(electing) e4
> handle_auth_request failed to assign global_id
> 2021-10-19 17:50:23.567 7fe49b90c700  1 mon.ceph3@1(electing) e4
> handle_auth_request failed to assign global_id
> 2021-10-19 17:50:23.771 7fe49b90c700  1 mon.ceph3@1(electing) e4
> handle_auth_request failed to assign global_id
> 2021-10-19 17:50:24.175 7fe49c90e700  1 mon.ceph3@1(electing) e4
> handle_auth_request failed to assign global_id
> 2021-10-19 17:50:24.979 7fe49c90e700  1 mon.ceph3@1(electing) e4
> handle_auth_request failed to assign global_id
> 2021-10-19 17:50:25.223 7fe49c90e700  1 mon.ceph3@1(electing) e4
> handle_auth_request failed to assign global_id
> 2021-10-19 17:50:25.263 7fe4a1117700  1
> mon.ceph3@1(electing).elector(42749) init, last seen epoch 42749,
> mid-election, bumping
> 2021-10-19 17:50:25.271 7fe49c90e700  1 mon.ceph3@1(electing) e4
> handle_auth_request failed to assign global_id
> 2021-10-19 17:50:25.279 7fe4a1117700 -1 mon.ceph3@1(electing) e4 failed
> to get devid for : fallback method has serial ''but no model
> 2021-10-19 17:50:25.487 7fe49c90e700  1 mon.ceph3@1(electing) e4
> handle_auth_request failed to assign global_id
>
>
> NTP is running on all nodes on cluster and time is in correct sync.
>
> Any help would be appreciated.
>
> thx!
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux