On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote: > Not a healthy monitor means that I can not get a monmap from none of them! If you look at the procedure at http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster you'll notice that you do not need any running monitors--it extracts the monmap from the data directory. This procedure should let you remove all trace of the new monitor so that the original works as before. sage > and none of the commands ceph health etc. are working. > > Best, > > George > > > Yes Sage! > > > > Priority is to fix things! > > > > Right now I don't have a healthy monitor! > > > > Can I remove all of them and add the first one from scratch? > > > > What would that mean about the data?? > > > > Best, > > > > George > > > > > On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote: > > > > This is the message that is flooding the ceph-mon.log now: > > > > > > > > > > > > 2015-03-14 08:16:39.286823 7f9f6920b700 1 > > > > mon.fu@0(electing).elector(1) init, last seen epoch 1 > > > > 2015-03-14 08:16:42.736674 7f9f6880a700 1 mon.fu@0(electing) e2 > > > > adding peer 15.12.6.21:6789/0 to list of hints > > > > 2015-03-14 08:16:42.737891 7f9f6880a700 1 > > > > mon.fu@0(electing).elector(1) discarding election message: > > > > 15.12.6.21:6789/0 > > > > not in my monmap e2: 2 mons at > > > > {fu=192.168.1.100:6789/0,jin=192.168.1.101:6789/0} > > > > > > It sounds like you need to follow some variation of this procedure: > > > > > > > > > > > > http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster > > > > > > ..although it may be that simply killing the daemon running on 15.12.6.21 > > > and restarting the other mon daemons will be enough. If not, the > > > procedure linked above will let tyou remove all traces of it and get > > > things up again. > > > > > > Not quite sure where things went awry but I assume the priority is to get > > > things working first and figure that out later! > > > > > > sage > > > > > > > > > > > > > > > > > > > George > > > > > > > > > > > > > This is the log for monitor (ceph-mon.log) when I try to restart the > > > > > monitor: > > > > > > > > > > > > > > > 2015-03-14 07:47:26.384561 7f1f1dc0f700 -1 mon.fu@0(probing) e2 *** > > > > > Got Signal Terminated *** > > > > > 2015-03-14 07:47:26.384593 7f1f1dc0f700 1 mon.fu@0(probing) e2 > > > > > shutdown > > > > > 2015-03-14 07:47:26.384654 7f1f1dc0f700 0 quorum service shutdown > > > > > 2015-03-14 07:47:26.384657 7f1f1dc0f700 0 > > > > > mon.fu@0(shutdown).health(0) HealthMonitor::service_shutdown 1 > > > > > services > > > > > 2015-03-14 07:47:26.384665 7f1f1dc0f700 0 quorum service shutdown > > > > > 2015-03-14 07:47:27.620670 7fc04b4437a0 0 ceph version 0.80.9 > > > > > (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-mon, pid > > > > > 17050 > > > > > 2015-03-14 07:47:27.703151 7fc04b4437a0 0 starting mon.fu rank 0 at > > > > > 192.168.1.100:6789/0 mon_data /var/lib/ceph/mon/ceph-fu fsid > > > > > a1132ec2-7104-4e8e-a3d5-95965cae9138 > > > > > 2015-03-14 07:47:27.703421 7fc04b4437a0 1 mon.fu@-1(probing) e2 > > > > > preinit fsid a1132ec2-7104-4e8e-a3d5-95965cae9138 > > > > > 2015-03-14 07:47:27.704504 7fc04b4437a0 1 > > > > > mon.fu@-1(probing).paxosservice(pgmap 897493..898204) refresh > > > > > upgraded, format 0 -> 1 > > > > > 2015-03-14 07:47:27.704525 7fc04b4437a0 1 mon.fu@-1(probing).pg v0 > > > > > on_upgrade discarding in-core PGMap > > > > > 2015-03-14 07:47:27.837060 7fc04b4437a0 0 mon.fu@-1(probing).mds > > > > > e104 print_map > > > > > epoch 104 > > > > > flags 0 > > > > > created 2014-11-30 01:58:17.176938 > > > > > modified 2015-03-14 06:07:05.683239 > > > > > tableserver 0 > > > > > root 0 > > > > > session_timeout 60 > > > > > session_autoclose 300 > > > > > max_file_size 1099511627776 > > > > > last_failure 0 > > > > > last_failure_osd_epoch 1760 > > > > > compat compat={},rocompat={},incompat={1=base v0.20,2=client > > > > > writeable ranges,3=default file layouts on dirs,4=dir inode in > > > > > separate object,5=mds uses versioned encoding,6=dirfrag is stored in > > > > > omap} > > > > > max_mds 1 > > > > > in 0 > > > > > up {0=59315} > > > > > failed > > > > > stopped > > > > > data_pools 3 > > > > > metadata_pool 4 > > > > > inline_data disabled > > > > > 59315: 15.12.6.21:6800/26628 'fu' mds.0.21 up:active seq 9 > > > > > > > > > > 2015-03-14 07:47:27.837972 7fc04b4437a0 0 mon.fu@-1(probing).osd > > > > > e1768 crush map has features 1107558400, adjusting msgr requires > > > > > 2015-03-14 07:47:27.837990 7fc04b4437a0 0 mon.fu@-1(probing).osd > > > > > e1768 crush map has features 1107558400, adjusting msgr requires > > > > > 2015-03-14 07:47:27.837996 7fc04b4437a0 0 mon.fu@-1(probing).osd > > > > > e1768 crush map has features 1107558400, adjusting msgr requires > > > > > 2015-03-14 07:47:27.838003 7fc04b4437a0 0 mon.fu@-1(probing).osd > > > > > e1768 crush map has features 1107558400, adjusting msgr requires > > > > > 2015-03-14 07:47:27.839054 7fc04b4437a0 1 > > > > > mon.fu@-1(probing).paxosservice(auth 2751..2829) refresh upgraded, > > > > > format 0 -> 1 > > > > > 2015-03-14 07:47:27.840052 7fc04b4437a0 0 mon.fu@-1(probing) e2 my > > > > > rank is now 0 (was -1) > > > > > 2015-03-14 07:47:27.840512 7fc045ef5700 0 -- 192.168.1.100:6789/0 >> > > > > > 192.168.1.101:6789/0 pipe(0x3958780 sd=13 :0 s=1 pgs=0 cs=0 l=0 > > > > > c=0x38c0dc0).fault > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> I can no longer start my OSDs :-@ > > > > >> > > > > >> > > > > >> failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf > > > > >> --name=osd.6 > > > > >> --keyring=/var/lib/ceph/osd/ceph-6/keyring osd crush create-or-move > > > > >> -- > > > > >> 6 3.63 host=fu root=default' > > > > >> > > > > >> > > > > >> Please help!!! > > > > >> > > > > >> George > > > > >> > > > > >>> ceph mon add stops at this: > > > > >>> > > > > >>> > > > > >>> [jin][INFO ] Running command: sudo ceph mon getmap -o > > > > >>> /var/lib/ceph/tmp/ceph.raijin.monmap > > > > >>> > > > > >>> > > > > >>> and never gets over it!!!!! > > > > >>> > > > > >>> > > > > >>> Any help?? > > > > >>> > > > > >>> Thanks, > > > > >>> > > > > >>> > > > > >>> George > > > > >>> > > > > >>>> Guyn any help much appreciated because my cluster is down :-( > > > > >>>> > > > > >>>> After trying ceph mon add which didn't complete since it was stuck > > > > >>>> for ever here: > > > > >>>> > > > > >>>> [jin][WARNIN] 2015-03-14 07:07:14.964265 7fb4be6f5700 0 > > > > >>>> monclient: > > > > >>>> hunting for new mon > > > > >>>> ^CKilled by signal 2. > > > > >>>> [ceph_deploy][ERROR ] KeyboardInterrupt > > > > >>>> > > > > >>>> > > > > >>>> the previously healthy node is now down completely :-( > > > > >>>> > > > > >>>> $ ceph mon stat > > > > >>>> 2015-03-14 07:21:37.782360 7ff2545b1700 0 -- > > > > >>>> 192.168.1.100:0/1042061 > > > > >>>> >> 192.168.1.101:6789/0 pipe(0x7ff248000c00 sd=4 :0 s=1 pgs=0 cs=0 > > > > >>>> l=1 > > > > >>>> c=0x7ff248000e90).fault > > > > >>>> ^CError connecting to cluster: InterruptedOrTimeoutError > > > > >>>> > > > > >>>> > > > > >>>> Any ideas?? > > > > >>>> > > > > >>>> > > > > >>>> All the best, > > > > >>>> > > > > >>>> George > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>>> Georgeos > > > > >>>>> > > > > >>>>> , you need to have "deployment server" and cd into folder that > > > > >>>>> you > > > > >>>>> used originaly while deploying CEPH - in this folder you should > > > > >>>>> already have ceph.conf, admin.client keyring and other stuff - > > > > >>>>> which > > > > >>>>> is required to to connect to cluster...and provision new MONs or > > > > >>>>> OSDs, > > > > >>>>> etc. > > > > >>>>> > > > > >>>>> Message: > > > > >>>>> [ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run > > > > >>>>> new to > > > > >>>>> create a new cluster... > > > > >>>>> > > > > >>>>> ...means (if Im not mistaken) that you are runnign ceph-deploy > > > > >>>>> from > > > > >>>>> NOT original folder... > > > > >>>>> > > > > >>>>> On 13 March 2015 at 23:03, Georgios Dimitrakakis wrote: > > > > >>>>> > > > > >>>>>> Not a firewall problem!! Firewall is disabled ... > > > > >>>>>> > > > > >>>>>> Loic I ve tried mon create because of this: > > > > >>>>>> > > > > >>>>> > > > > >>>>> > > > > http://ceph.com/docs/v0.80.5/start/quick-ceph-deploy/#adding-monitors > > > > >>>>>> [4] > > > > >>>>>> > > > > >>>>>> Should I first create and then add?? What is the proper order??? > > > > >>>>>> Should I do it from the already existing monitor node or can I > > > > >>>>>> run > > > > >>>>>> it from the new one? > > > > >>>>>> > > > > >>>>>> If I try add from the beginning I am getting this: > > > > >>>>>> > > > > >>>>>> ceph_deploy.conf][DEBUG ] found configuration file at: > > > > >>>>>> /home/.cephdeploy.conf > > > > >>>>>> [ceph_deploy.cli][INFO ] Invoked (1.5.22): /usr/bin/ceph-deploy > > > > >>>>>> mon add jin > > > > >>>>>> [ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run > > > > >>>>>> new > > > > >>>>>> to create a new cluster > > > > >>>>>> > > > > >>>>>> Regards, > > > > >>>>>> > > > > >>>>>> George > > > > >>>>>> > > > > >>>>>>> Hi, > > > > >>>>>>> > > > > >>>>>>> I think ceph-deploy mon add (instead of create) is what you > > > > >>>>>>> should be using. > > > > >>>>>>> > > > > >>>>>>> Cheers > > > > >>>>>>> > > > > >>>>>>> On 13/03/2015 22:25, Georgios Dimitrakakis wrote: > > > > >>>>>>> > > > > >>>>>>>> On an already available cluster I ve tried to add a new > > > > >>>>>>>> monitor! > > > > >>>>>>>> > > > > >>>>>>>> I have used ceph-deploy mon create {NODE} > > > > >>>>>>>> > > > > >>>>>>>> where {NODE}=the name of the node > > > > >>>>>>>> > > > > >>>>>>>> and then I restarted the /etc/init.d/ceph service with a > > > > >>>>>>>> success at the node > > > > >>>>>>>> where it showed that the monitor is running like: > > > > >>>>>>>> > > > > >>>>>>>> # /etc/init.d/ceph restart > > > > >>>>>>>> === mon.jin === > > > > >>>>>>>> === mon.jin === > > > > >>>>>>>> Stopping Ceph mon.jin on jin...kill 36388...done > > > > >>>>>>>> === mon.jin === > > > > >>>>>>>> Starting Ceph mon.jin on jin... > > > > >>>>>>>> Starting ceph-create-keys on jin... > > > > >>>>>>>> > > > > >>>>>>>> But checking the quorum it doesnt show the newly added > > > > >>>>>>>> monitor! > > > > >>>>>>>> > > > > >>>>>>>> Plus ceph mon stat gives out only 1 monitor!!! > > > > >>>>>>>> > > > > >>>>>>>> # ceph mon stat > > > > >>>>>>>> e1: 1 mons at {fu=MAILSCANNER WARNING: NUMERICAL LINKS ARE > > > > >>>>>>>> OFTEN MALICIOUS: 192.168.1.100:6789/0 [1]}, election epoch 1, > > > > >>>>>>>> quorum 0 fu > > > > >>>>>>>> > > > > >>>>>>>> Any ideas on what have I done wrong??? > > > > >>>>>>>> > > > > >>>>>>>> Regards, > > > > >>>>>>>> > > > > >>>>>>>> George > > > > >>>>>>>> _______________________________________________ > > > > >>>>>>>> ceph-users mailing list > > > > >>>>>>>> ceph-users@xxxxxxxxxxxxxx [2] > > > > >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3] > > > > >>>>>> _______________________________________________ > > > > >>>>>> ceph-users mailing list > > > > >>>>>> ceph-users@xxxxxxxxxxxxxx [5] > > > > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [6] > > > > >>>> _______________________________________________ > > > > >>>> ceph-users mailing list > > > > >>>> ceph-users@xxxxxxxxxxxxxx > > > > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > >>> _______________________________________________ > > > > >>> ceph-users mailing list > > > > >>> ceph-users@xxxxxxxxxxxxxx > > > > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > >> _______________________________________________ > > > > >> ceph-users mailing list > > > > >> ceph-users@xxxxxxxxxxxxxx > > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > _______________________________________________ > > > > > ceph-users mailing list > > > > > ceph-users@xxxxxxxxxxxxxx > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > > > > ceph-users mailing list > > > > ceph-users@xxxxxxxxxxxxxx > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > >
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com