Re: 1 mon unable to join the quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2018-03-19 11:03:50.819493 7f842ed47640  0 mon.controller02 does not
exist in monmap, will attempt to join an existing cluster
2018-03-19 11:03:50.820323 7f842ed47640  0 starting mon.controller02
rank -1 at 172.18.8.6:6789/0 mon_data
/var/lib/ceph/mon/ceph-controller02 fsid
f37f31b1-92c5-47c8-9834-1757a677d020

We are called 'mon.controller02' and we can not find our name in the
local copy of the monmap.

2018-03-19 11:03:52.346318 7f842735d700 10
mon.controller02@-1(probing) e68  ready to join, but i'm not in the
monmap or my addr is blank, trying to join

Our name is not in the copy of the monmap we got from peer controller01 either.

$ cat ../controller02-mon_status.log
[root@controller02 ~]# ceph --admin-daemon
/var/run/ceph/ceph-mon.controller02.asok mon_status
{
    "name": "controller02",
    "rank": 1,
    "state": "electing",
    "election_epoch": 32749,
    "quorum": [],
    "outside_quorum": [],
    "extra_probe_peers": [],
    "sync_provider": [],
    "monmap": {
        "epoch": 71,
        "fsid": "f37f31b1-92c5-47c8-9834-1757a677d020",
        "modified": "2018-03-29 10:48:06.371157",
        "created": "0.000000",
        "mons": [
            {
                "rank": 0,
                "name": "controller01",
                "addr": "172.18.8.5:6789\/0"
            },
            {
                "rank": 1,
                "name": "controller02",
                "addr": "172.18.8.6:6789\/0"
            },
            {
                "rank": 2,
                "name": "controller03",
                "addr": "172.18.8.7:6789\/0"
            }
        ]
    }
}

In the monmaps we are called 'controller02', not 'mon.controller02'.
These names need to be identical.


On Thu, Mar 29, 2018 at 7:23 PM, Julien Lavesque
<julien.lavesque@xxxxxxxxxxxxxxxxxx> wrote:
> Hi Brad,
>
> The results have been uploaded on the tracker
> (https://tracker.ceph.com/issues/23403)
>
> Julien
>
>
> On 29/03/2018 07:54, Brad Hubbard wrote:
>>
>> Can you update with the result of the following commands from all of the
>> MONs?
>>
>> # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok mon_status
>> # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok quorum_status
>>
>> On Thu, Mar 29, 2018 at 3:11 PM, Gauvain Pocentek
>> <gauvain.pocentek@xxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Hello Ceph users,
>>>
>>> We are having a problem on a ceph cluster running Jewel: one of the mons
>>> left the quorum, and we  have not been able to make it join again. The
>>> two
>>> other monitors are running just fine, but obviously we need this third
>>> one.
>>>
>>> The problem happened before Jewel, when the cluster was running
>>> Infernalis.
>>> We upgraded hoping that it would solve the problem, but no luck.
>>>
>>> We've validated several things: no network problem, no clock skew, same
>>> OS
>>> and ceph version everywhere. We've also removed the mon completely, and
>>> recreated it. We also tried to run an additional mon on one of the OSD
>>> machines, this mon didn't join the quorum either.
>>>
>>> We've opened https://tracker.ceph.com/issues/23403 with logs from the 3
>>> mons
>>> during a fresh startup of the problematic logs.
>>>
>>> Is there anything we could try to do to resolve this issue? We are
>>> getting
>>> out of ideas.
>>>
>>> We'd appreciate any suggestion!
>>>
>>> Gauvain Pocentek
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux