Re: monitor not joining quorum

Denis Polom <denispolom@xxxxxxxxx> · Tue, 19 Oct 2021 20:59:21 +0200

also on monitor logs on monitor that is unable to join the quorum I see 
following in the log file:

2021-10-19 16:22:07.629 7faec9dd2700  1 mon.ceph1@0(synchronizing) e4 
handle_auth_request failed to assign global_id
2021-10-19 16:22:08.193 7faec8dd0700  1 mon.ceph1@0(synchronizing) e4 
handle_auth_request failed to assign global_id
2021-10-19 16:22:09.565 7faec8dd0700  1 mon.ceph1@0(synchronizing) e4 
handle_auth_request failed to assign global_id
2021-10-19 16:22:11.885 7faec8dd0700  1 mon.ceph1@0(synchronizing) e4 
handle_auth_request failed to assign global_id
2021-10-19 16:22:14.233 7faec8dd0700  1 mon.ceph1@0(synchronizing) e4 
handle_auth_request failed to assign global_id
2021-10-19 16:22:14.889 7faec8dd0700  1 mon.ceph1@0(synchronizing) e4 
handle_auth_request failed to assign global_id
2021-10-19 16:22:16.365 7faec8dd0700  1 mon.ceph1@0(synchronizing) e4 
handle_auth_request failed to assign global_id

any idea how to get this monitor to join the quorum?

thx!

On 10/19/21 18:23, denispolom@xxxxxxxxx wrote:
Hi Adam,

it's
ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) 
nautilus (stable)

19. 10. 2021 18:19:29 Adam King <adking@xxxxxxxxxx>:

    Hi Denis,

    Which ceph version is your cluster running on? I know there was an
    issue with mons getting dropped from the monmap (and therefore
    being stuck out of quorum) when their host was rebooted in Pacific
    version prior to 16.2.6 https://tracker.ceph.com/issues/51027. If
    you're on a Pacific version older than 16.2.6 it could be that
    same issue and workarounds are discussed in the tracker. Even if
    you are on 16.2.6 the workarounds in that tracker could still be
    helpful.

    On Tue, Oct 19, 2021 at 12:07 PM Denis Polom
    <denispolom@xxxxxxxxx> wrote:

        Hi,

        one of our monitor VM  was rebooted and not joining quorum
        again (quorum
        consist out of 3 monitors). While monitor service (ceph1) is
        running on
        this VM, Ceph cluster become unreachable. In monitor logs on
        ceph3 VM  I
        can see a lot of  following messages:

        2021-10-19 17:50:19.555 7fe49e912700  0 log_channel(audit) log
        [DBG] :
        from='client.? 10.13.68.11:0/1846917599
        <http://10.13.68.11:0/1846917599>' entity='client.admin'
        cmd=[{"prefix": "osd blacklist ls"}]: dispatch
        2021-10-19 17:50:20.255 7fe4a1117700  1
        mon.ceph3@1(leader).paxos(paxos
        updating c 95374479..95375018) accept timeout, calling fresh
        election
        2021-10-19 17:50:20.255 7fe49e912700  0 log_channel(cluster)
        log [INF] :
        mon.ceph3 calling monitor election
        2021-10-19 17:50:20.255 7fe49e912700  1
        mon.ceph3@1(electing).elector(42748) init, last seen epoch 42748
        2021-10-19 17:50:20.263 7fe49e912700 -1 mon.ceph3@1(electing)
        e4 failed
        to get devid for : fallback method has serial ''but no model
        2021-10-19 17:50:21.491 7fe49b90c700  1 mon.ceph3@1(electing) e4
        handle_auth_request failed to assign global_id
        2021-10-19 17:50:23.567 7fe49b90c700  1 mon.ceph3@1(electing) e4
        handle_auth_request failed to assign global_id
        2021-10-19 17:50:23.771 7fe49b90c700  1 mon.ceph3@1(electing) e4
        handle_auth_request failed to assign global_id
        2021-10-19 17:50:24.175 7fe49c90e700  1 mon.ceph3@1(electing) e4
        handle_auth_request failed to assign global_id
        2021-10-19 17:50:24.979 7fe49c90e700  1 mon.ceph3@1(electing) e4
        handle_auth_request failed to assign global_id
        2021-10-19 17:50:25.223 7fe49c90e700  1 mon.ceph3@1(electing) e4
        handle_auth_request failed to assign global_id
        2021-10-19 17:50:25.263 7fe4a1117700  1
        mon.ceph3@1(electing).elector(42749) init, last seen epoch 42749,
        mid-election, bumping
        2021-10-19 17:50:25.271 7fe49c90e700  1 mon.ceph3@1(electing) e4
        handle_auth_request failed to assign global_id
        2021-10-19 17:50:25.279 7fe4a1117700 -1 mon.ceph3@1(electing)
        e4 failed
        to get devid for : fallback method has serial ''but no model
        2021-10-19 17:50:25.487 7fe49c90e700  1 mon.ceph3@1(electing) e4
        handle_auth_request failed to assign global_id

        NTP is running on all nodes on cluster and time is in correct
        sync.

        Any help would be appreciated.

        thx!

        _______________________________________________
        ceph-users mailing list -- ceph-users@xxxxxxx
        To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx