See my latest update in the tracker. On Sun, Apr 1, 2018 at 2:27 AM, Julien Lavesque <julien.lavesque@xxxxxxxxxxxxxxxxxx> wrote: > At first the cluster has been deployed using ceph-ansible in version > infernalis. > For some unknown reason the controller02 was out of the quorum and we were > unable to add it in the quorum. > > We have updated the cluster to jewel version using the rolling-update > playbook from ceph-ansible > > The controller02 was still not in the quorum. > > We tried to delete the mon completely and add it again using the manual > method of http://docs.ceph.com/docs/jewel/rados/operations/add-or-rm-mons/ > (with id controller02) > > The logs provided are when the controller02 was added with the manual > method. > > But the controller02 won't join the cluster > > Hope It helps understand > > > > On 31/03/2018 02:12, Brad Hubbard wrote: >> >> I'm not sure I completely understand your "test". What exactly are you >> trying to achieve and what documentation are you following? >> >> On Fri, Mar 30, 2018 at 10:49 PM, Julien Lavesque >> <julien.lavesque@xxxxxxxxxxxxxxxxxx> wrote: >>> >>> Brad, >>> >>> Thanks for your answer >>> >>> On 30/03/2018 02:09, Brad Hubbard wrote: >>>> >>>> >>>> 2018-03-19 11:03:50.819493 7f842ed47640 0 mon.controller02 does not >>>> exist in monmap, will attempt to join an existing cluster >>>> 2018-03-19 11:03:50.820323 7f842ed47640 0 starting mon.controller02 >>>> rank -1 at 172.18.8.6:6789/0 mon_data >>>> /var/lib/ceph/mon/ceph-controller02 fsid >>>> f37f31b1-92c5-47c8-9834-1757a677d020 >>>> >>>> We are called 'mon.controller02' and we can not find our name in the >>>> local copy of the monmap. >>>> >>>> 2018-03-19 11:03:52.346318 7f842735d700 10 >>>> mon.controller02@-1(probing) e68 ready to join, but i'm not in the >>>> monmap or my addr is blank, trying to join >>>> >>>> Our name is not in the copy of the monmap we got from peer controller01 >>>> either. >>> >>> >>> >>> During our test we have deleted completely the controller02 monitor and >>> add >>> it again. >>> >>> The log you have is when the controller02 is added (so it wasn't in the >>> monmap before) >>> >>> >>>> >>>> $ cat ../controller02-mon_status.log >>>> [root@controller02 ~]# ceph --admin-daemon >>>> /var/run/ceph/ceph-mon.controller02.asok mon_status >>>> { >>>> "name": "controller02", >>>> "rank": 1, >>>> "state": "electing", >>>> "election_epoch": 32749, >>>> "quorum": [], >>>> "outside_quorum": [], >>>> "extra_probe_peers": [], >>>> "sync_provider": [], >>>> "monmap": { >>>> "epoch": 71, >>>> "fsid": "f37f31b1-92c5-47c8-9834-1757a677d020", >>>> "modified": "2018-03-29 10:48:06.371157", >>>> "created": "0.000000", >>>> "mons": [ >>>> { >>>> "rank": 0, >>>> "name": "controller01", >>>> "addr": "172.18.8.5:6789\/0" >>>> }, >>>> { >>>> "rank": 1, >>>> "name": "controller02", >>>> "addr": "172.18.8.6:6789\/0" >>>> }, >>>> { >>>> "rank": 2, >>>> "name": "controller03", >>>> "addr": "172.18.8.7:6789\/0" >>>> } >>>> ] >>>> } >>>> } >>>> >>>> In the monmaps we are called 'controller02', not 'mon.controller02'. >>>> These names need to be identical. >>>> >>> >>> The cluster has been deployed using ceph-ansible with the servers >>> hostname. >>> All monitors are called mon.controller0x in the monmap and all the 3 >>> monitors have the same configuration >>> >>> We have the same behavior creating a monmap from scratch : >>> >>> [root@controller03 ~]# monmaptool --create --add controller01 >>> 172.18.8.5:6789 --add controller02 172.18.8.6:6789 --add controller03 >>> 172.18.8.7:6789 --fsid f37f31b1-92c5-47c8-9834-1757a677d020 --clobber >>> test-monmap >>> monmaptool: monmap file test-monmap >>> monmaptool: set fsid to f37f31b1-92c5-47c8-9834-1757a677d020 >>> monmaptool: writing epoch 0 to test-monmap (3 monitors) >>> >>> [root@controller03 ~]# monmaptool --print test-monmap >>> monmaptool: monmap file test-monmap >>> epoch 0 >>> fsid f37f31b1-92c5-47c8-9834-1757a677d020 >>> last_changed 2018-03-30 14:42:18.809719 >>> created 2018-03-30 14:42:18.809719 >>> 0: 172.18.8.5:6789/0 mon.controller01 >>> 1: 172.18.8.6:6789/0 mon.controller02 >>> 2: 172.18.8.7:6789/0 mon.controller03 >>> >>> >>>> >>>> On Thu, Mar 29, 2018 at 7:23 PM, Julien Lavesque >>>> <julien.lavesque@xxxxxxxxxxxxxxxxxx> wrote: >>>>> >>>>> >>>>> Hi Brad, >>>>> >>>>> The results have been uploaded on the tracker >>>>> (https://tracker.ceph.com/issues/23403) >>>>> >>>>> Julien >>>>> >>>>> >>>>> On 29/03/2018 07:54, Brad Hubbard wrote: >>>>>> >>>>>> >>>>>> >>>>>> Can you update with the result of the following commands from all of >>>>>> the >>>>>> MONs? >>>>>> >>>>>> # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok >>>>>> mon_status >>>>>> # ceph --admin-daemon /var/run/ceph/ceph-mon.[whatever].asok >>>>>> quorum_status >>>>>> >>>>>> On Thu, Mar 29, 2018 at 3:11 PM, Gauvain Pocentek >>>>>> <gauvain.pocentek@xxxxxxxxxxxxxxxxxx> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hello Ceph users, >>>>>>> >>>>>>> We are having a problem on a ceph cluster running Jewel: one of the >>>>>>> mons >>>>>>> left the quorum, and we have not been able to make it join again. >>>>>>> The >>>>>>> two >>>>>>> other monitors are running just fine, but obviously we need this >>>>>>> third >>>>>>> one. >>>>>>> >>>>>>> The problem happened before Jewel, when the cluster was running >>>>>>> Infernalis. >>>>>>> We upgraded hoping that it would solve the problem, but no luck. >>>>>>> >>>>>>> We've validated several things: no network problem, no clock skew, >>>>>>> same >>>>>>> OS >>>>>>> and ceph version everywhere. We've also removed the mon completely, >>>>>>> and >>>>>>> recreated it. We also tried to run an additional mon on one of the >>>>>>> OSD >>>>>>> machines, this mon didn't join the quorum either. >>>>>>> >>>>>>> We've opened https://tracker.ceph.com/issues/23403 with logs from the >>>>>>> 3 >>>>>>> mons >>>>>>> during a fresh startup of the problematic logs. >>>>>>> >>>>>>> Is there anything we could try to do to resolve this issue? We are >>>>>>> getting >>>>>>> out of ideas. >>>>>>> >>>>>>> We'd appreciate any suggestion! >>>>>>> >>>>>>> Gauvain Pocentek >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com