Re: Problems with mon

Gaël THEROND <gael.therond@xxxxxxxxxxxx> · Tue, 13 Oct 2020 14:59:29 +0200

If you’ve got all “Nodes” up and running fine now, here what I’ve done on
my own just this morning.

1°/- Ensure all MONs get the same /etc/ceph/ceph.conf file.

2°/- Many times you MONs share the same keyring, if so, ensure you’ve got
the right keyring in both places /etc/ceph/ceph.mon.keyring and
/var/lib/ceph/mon/<clustername>-<hostname>/keyring

3°/- Delete your NOT HEALTHY mon store and kv that you can found out on
/var/lib/ceph/mon/<clustername>-<hostname>/ it will be rebuild during the
restart of the mon process.

4°/- Start the latest healthy monitor and wait for him to complain about no
way to acquire global_id.

5°/- Start the remaining MONs.

You should see the quorum trigger a new election as soon as each mons will
have detected it is part of an already existing cluster and so retrieve the
appropriate data (store/kv/etc) from the remaining healthy MON.

This procedure can fail if your not healthy MONs don’t get the appropriate
keyring.

Le mar. 13 oct. 2020 à 12:56, Mateusz Skała <mateusz.skala@xxxxxxxxx> a
écrit :

> Hi,
> Thanks for responding, all monitors goes down, 2/3 is actually up, but
> probably not in the quorum. Quick look for before tasks:
>
>    1. few pgs without scrub and deep-scrub, 2 mons in cluster
>    2. added one monitor (via ansible), ansible restarted osd
>    3. all system os filesystem goes full (because of multiple sst files)
>    4. all pods with monitors goes down
>    5. added new fs for monitors, and move data from system os to this fs
>    6. 2 monitors started (last with failure), but not responding for any
>    commands
>
> Regards
> Mateusz Skała
>
>
> On Tue, 13 Oct 2020 at 11:25, Gaël THEROND <gael.therond@xxxxxxxxxxxx>
> wrote:
>
>> This error means your quorum didn’t formed.
>>
>> How much mon nodes do you have usually and how much went down?
>>
>> Le mar. 13 oct. 2020 à 10:56, Mateusz Skała <mateusz.skala@xxxxxxxxx> a
>> écrit :
>>
>>> Hello Community,
>>> I have problems with ceph-mons in docker. Docker pods are starting but I
>>> got a lot of messages "e6 handle_auth_request failed to assign global_id”
>>> in log. 2 mons are up but I can’t send any ceph commands.
>>> Regards
>>> Mateusz
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx