I'm not sure I completely understand your "test". What exactly are you trying to achieve and what documentation are you following? On Fri, Mar 30, 2018 at 10:49 PM, Julien Lavesque <julien.lavesque@xxxxxxxxxxxxxxxxxx> wrote: > Brad, > > Thanks for your answer > > On 30/03/2018 02:09, Brad Hubbard wrote: >> >> 2018-03-19 11:03:50.819493 7f842ed47640 0 mon.controller02 does not >> exist in monmap, will attempt to join an existing cluster >> 2018-03-19 11:03:50.820323 7f842ed47640 0 starting mon.controller02 >> rank -1 at 172.18.8.6:6789/0 mon_data >> /var/lib/ceph/mon/ceph-controller02 fsid >> f37f31b1-92c5-47c8-9834-1757a677d020 >> >> We are called 'mon.controller02' and we can not find our name in the >> local copy of the monmap. >> >> 2018-03-19 11:03:52.346318 7f842735d700 10 >> mon.controller02@-1(probing) e68 ready to join, but i'm not in the >> monmap or my addr is blank, trying to join >> >> Our name is not in the copy of the monmap we got from peer controller01 >> either. > > > During our test we have deleted completely the controller02 monitor and add > it again. > > The log you have is when the controller02 is added (so it wasn't in the > monmap before) > > >> >> $ cat ../controller02-mon_status.log >> [root@controller02 ~]# ceph --admin-daemon >> /var/run/ceph/ceph-mon.controller02.asok mon_status >> { >> "name": "controller02", >> "rank": 1, >> "state": "electing", >> "election_epoch": 32749, >> "quorum": [], >> "outside_quorum": [], >> "extra_probe_peers": [], >> "sync_provider": [], >> "monmap": { >> "epoch": 71, >> "fsid": "f37f31b1-92c5-47c8-9834-1757a677d020", >> "modified": "2018-03-29 10:48:06.371157", >> "created": "0.000000", >> "mons": [ >> { >> "rank": 0, >> "name": "controller01", >> "addr": "172.18.8.5:6789\/0" >> }, >> { >> "rank": 1, >> "name": "controller02", >> "addr": "172.18.8.6:6789\/0" >> }, >> { >> "rank": 2, >> "name": "controller03", >> "addr": "172.18.8.7:6789\/0" >> } >> ] >> } >> } >> >> In the monmaps we are called 'controller02', not 'mon.controller02'. >> These names need to be identical. >> > > The cluster has been deployed using ceph-ansible with the servers hostname. > All monitors are called mon.controller0x in the monmap and all the 3 > monitors have the same configuration > > We have the same behavior creating a monmap from scratch : > > [root@controller03 ~]# monmaptool --create --add controller01 > 172.18.8.5:6789 --add controller02 172.18.8.6:6789 --add controller03 > 172.18.8.7:6789 --fsid f37f31b1-92c5-47c8-9834-1757a677d020 --clobber > test-monmap > monmaptool: monmap file test-monmap > monmaptool: set fsid to f37f31b1-92c5-47c8-9834-1757a677d020 > monmaptool: writing epoch 0 to test-monmap (3 monitors) > > [root@controller03 ~]# monmaptool --print test-monmap > monmaptool: monmap file test-monmap > epoch 0 > fsid f37f31b1-92c5-47c8-9834-1757a677d020 > last_changed 2018-03-30 14:42:18.809719 > created 2018-03-30 14:42:18.809719 > 0: 172.18.8.5:6789/0 mon.controller01 > 1: 172.18.8.6:6789/0 mon.controller02 > 2: 172.18.8.7:6789/0 mon.controller03 > > >> >> On Thu, Mar 29, 2018 at 7:23 PM, Julien Lavesque >> <julien.lavesque@xxxxxxxxxxxxxxxxxx> wrote: >>> >>> Hi Brad, >>> >>> The results have been uploaded on the tracker >>> (https://tracker.ceph.com/issues/23403) >>> >>> Julien >>> >>> >>> On 29/03/2018 07:54, Brad Hubbard wrote: >>>> >>>> >>>> Can you update with the result of the following commands from all of the >>>> MONs? >>>> >>>> # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok mon_status >>>> # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok >>>> quorum_status >>>> >>>> On Thu, Mar 29, 2018 at 3:11 PM, Gauvain Pocentek >>>> <gauvain.pocentek@xxxxxxxxxxxxxxxxxx> wrote: >>>>> >>>>> >>>>> Hello Ceph users, >>>>> >>>>> We are having a problem on a ceph cluster running Jewel: one of the >>>>> mons >>>>> left the quorum, and we have not been able to make it join again. The >>>>> two >>>>> other monitors are running just fine, but obviously we need this third >>>>> one. >>>>> >>>>> The problem happened before Jewel, when the cluster was running >>>>> Infernalis. >>>>> We upgraded hoping that it would solve the problem, but no luck. >>>>> >>>>> We've validated several things: no network problem, no clock skew, same >>>>> OS >>>>> and ceph version everywhere. We've also removed the mon completely, and >>>>> recreated it. We also tried to run an additional mon on one of the OSD >>>>> machines, this mon didn't join the quorum either. >>>>> >>>>> We've opened https://tracker.ceph.com/issues/23403 with logs from the 3 >>>>> mons >>>>> during a fresh startup of the problematic logs. >>>>> >>>>> Is there anything we could try to do to resolve this issue? We are >>>>> getting >>>>> out of ideas. >>>>> >>>>> We'd appreciate any suggestion! >>>>> >>>>> Gauvain Pocentek >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com