Re: 1 mon unable to join the quorum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm not sure I completely understand your "test". What exactly are you
trying to achieve and what documentation are you following?

On Fri, Mar 30, 2018 at 10:49 PM, Julien Lavesque
<julien.lavesque@xxxxxxxxxxxxxxxxxx> wrote:
> Brad,
>
> Thanks for your answer
>
> On 30/03/2018 02:09, Brad Hubbard wrote:
>>
>> 2018-03-19 11:03:50.819493 7f842ed47640  0 mon.controller02 does not
>> exist in monmap, will attempt to join an existing cluster
>> 2018-03-19 11:03:50.820323 7f842ed47640  0 starting mon.controller02
>> rank -1 at 172.18.8.6:6789/0 mon_data
>> /var/lib/ceph/mon/ceph-controller02 fsid
>> f37f31b1-92c5-47c8-9834-1757a677d020
>>
>> We are called 'mon.controller02' and we can not find our name in the
>> local copy of the monmap.
>>
>> 2018-03-19 11:03:52.346318 7f842735d700 10
>> mon.controller02@-1(probing) e68  ready to join, but i'm not in the
>> monmap or my addr is blank, trying to join
>>
>> Our name is not in the copy of the monmap we got from peer controller01
>> either.
>
>
> During our test we have deleted completely the controller02 monitor and add
> it again.
>
> The log you have is when the controller02 is added (so it wasn't in the
> monmap before)
>
>
>>
>> $ cat ../controller02-mon_status.log
>> [root@controller02 ~]# ceph --admin-daemon
>> /var/run/ceph/ceph-mon.controller02.asok mon_status
>> {
>>     "name": "controller02",
>>     "rank": 1,
>>     "state": "electing",
>>     "election_epoch": 32749,
>>     "quorum": [],
>>     "outside_quorum": [],
>>     "extra_probe_peers": [],
>>     "sync_provider": [],
>>     "monmap": {
>>         "epoch": 71,
>>         "fsid": "f37f31b1-92c5-47c8-9834-1757a677d020",
>>         "modified": "2018-03-29 10:48:06.371157",
>>         "created": "0.000000",
>>         "mons": [
>>             {
>>                 "rank": 0,
>>                 "name": "controller01",
>>                 "addr": "172.18.8.5:6789\/0"
>>             },
>>             {
>>                 "rank": 1,
>>                 "name": "controller02",
>>                 "addr": "172.18.8.6:6789\/0"
>>             },
>>             {
>>                 "rank": 2,
>>                 "name": "controller03",
>>                 "addr": "172.18.8.7:6789\/0"
>>             }
>>         ]
>>     }
>> }
>>
>> In the monmaps we are called 'controller02', not 'mon.controller02'.
>> These names need to be identical.
>>
>
> The cluster has been deployed using ceph-ansible with the servers hostname.
> All monitors are called mon.controller0x in the monmap and all the 3
> monitors have the same configuration
>
> We have the same behavior creating a monmap from scratch :
>
> [root@controller03 ~]# monmaptool --create --add controller01
> 172.18.8.5:6789 --add controller02 172.18.8.6:6789 --add controller03
> 172.18.8.7:6789 --fsid f37f31b1-92c5-47c8-9834-1757a677d020 --clobber
> test-monmap
> monmaptool: monmap file test-monmap
> monmaptool: set fsid to f37f31b1-92c5-47c8-9834-1757a677d020
> monmaptool: writing epoch 0 to test-monmap (3 monitors)
>
> [root@controller03 ~]# monmaptool --print test-monmap
> monmaptool: monmap file test-monmap
> epoch 0
> fsid f37f31b1-92c5-47c8-9834-1757a677d020
> last_changed 2018-03-30 14:42:18.809719
> created 2018-03-30 14:42:18.809719
> 0: 172.18.8.5:6789/0 mon.controller01
> 1: 172.18.8.6:6789/0 mon.controller02
> 2: 172.18.8.7:6789/0 mon.controller03
>
>
>>
>> On Thu, Mar 29, 2018 at 7:23 PM, Julien Lavesque
>> <julien.lavesque@xxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Hi Brad,
>>>
>>> The results have been uploaded on the tracker
>>> (https://tracker.ceph.com/issues/23403)
>>>
>>> Julien
>>>
>>>
>>> On 29/03/2018 07:54, Brad Hubbard wrote:
>>>>
>>>>
>>>> Can you update with the result of the following commands from all of the
>>>> MONs?
>>>>
>>>> # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok mon_status
>>>> # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok
>>>> quorum_status
>>>>
>>>> On Thu, Mar 29, 2018 at 3:11 PM, Gauvain Pocentek
>>>> <gauvain.pocentek@xxxxxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>>
>>>>> Hello Ceph users,
>>>>>
>>>>> We are having a problem on a ceph cluster running Jewel: one of the
>>>>> mons
>>>>> left the quorum, and we  have not been able to make it join again. The
>>>>> two
>>>>> other monitors are running just fine, but obviously we need this third
>>>>> one.
>>>>>
>>>>> The problem happened before Jewel, when the cluster was running
>>>>> Infernalis.
>>>>> We upgraded hoping that it would solve the problem, but no luck.
>>>>>
>>>>> We've validated several things: no network problem, no clock skew, same
>>>>> OS
>>>>> and ceph version everywhere. We've also removed the mon completely, and
>>>>> recreated it. We also tried to run an additional mon on one of the OSD
>>>>> machines, this mon didn't join the quorum either.
>>>>>
>>>>> We've opened https://tracker.ceph.com/issues/23403 with logs from the 3
>>>>> mons
>>>>> during a fresh startup of the problematic logs.
>>>>>
>>>>> Is there anything we could try to do to resolve this issue? We are
>>>>> getting
>>>>> out of ideas.
>>>>>
>>>>> We'd appreciate any suggestion!
>>>>>
>>>>> Gauvain Pocentek
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux