Best,
George
> Yes Sage!
>
> Priority is to fix things!
>
> Right now I don't have a healthy monitor!
>
> Can I remove all of them and add the first one from scratch?
>
> What would that mean about the data??
>
> Best,
>
> George
>
> > On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote:
> > > This is the message that is flooding the ceph-mon.log now:
> > >
> > >
> > > 2015-03-14 08:16:39.286823 7f9f6920b700 1
> > > mon.fu@0(electing).elector(1) init, last seen epoch 1
> > > 2015-03-14 08:16:42.736674 7f9f6880a700 1 mon.fu@0(electing)
e2
> > > adding peer 15.12.6.21:6789/0 to list of hints
> > > 2015-03-14 08:16:42.737891 7f9f6880a700 1
> > > mon.fu@0(electing).elector(1) discarding election message:
> > > 15.12.6.21:6789/0
> > > not in my monmap e2: 2 mons at
> > > {fu=192.168.1.100:6789/0,jin=192.168.1.101:6789/0}
> >
> > It sounds like you need to follow some variation of this
procedure:
> >
> >
> >
> >
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster
> >
> > ..although it may be that simply killing the daemon running on
15.12.6.21
> > and restarting the other mon daemons will be enough. If not,
the
> > procedure linked above will let tyou remove all traces of it and
get
> > things up again.
> >
> > Not quite sure where things went awry but I assume the priority
is to get
> > things working first and figure that out later!
> >
> > sage
> >
> > >
> > >
> > >
> > > George
> > >
> > >
> > > > This is the log for monitor (ceph-mon.log) when I try to
restart the
> > > > monitor:
> > > >
> > > >
> > > > 2015-03-14 07:47:26.384561 7f1f1dc0f700 -1 mon.fu@0(probing)
e2 ***
> > > > Got Signal Terminated ***
> > > > 2015-03-14 07:47:26.384593 7f1f1dc0f700 1 mon.fu@0(probing)
e2
> > > > shutdown
> > > > 2015-03-14 07:47:26.384654 7f1f1dc0f700 0 quorum service
shutdown
> > > > 2015-03-14 07:47:26.384657 7f1f1dc0f700 0
> > > > mon.fu@0(shutdown).health(0) HealthMonitor::service_shutdown
1
> > > > services
> > > > 2015-03-14 07:47:26.384665 7f1f1dc0f700 0 quorum service
shutdown
> > > > 2015-03-14 07:47:27.620670 7fc04b4437a0 0 ceph version
0.80.9
> > > > (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process
ceph-mon, pid
> > > > 17050
> > > > 2015-03-14 07:47:27.703151 7fc04b4437a0 0 starting mon.fu
rank 0 at
> > > > 192.168.1.100:6789/0 mon_data /var/lib/ceph/mon/ceph-fu fsid
> > > > a1132ec2-7104-4e8e-a3d5-95965cae9138
> > > > 2015-03-14 07:47:27.703421 7fc04b4437a0 1
mon.fu@-1(probing) e2
> > > > preinit fsid a1132ec2-7104-4e8e-a3d5-95965cae9138
> > > > 2015-03-14 07:47:27.704504 7fc04b4437a0 1
> > > > mon.fu@-1(probing).paxosservice(pgmap 897493..898204)
refresh
> > > > upgraded, format 0 -> 1
> > > > 2015-03-14 07:47:27.704525 7fc04b4437a0 1
mon.fu@-1(probing).pg v0
> > > > on_upgrade discarding in-core PGMap
> > > > 2015-03-14 07:47:27.837060 7fc04b4437a0 0
mon.fu@-1(probing).mds
> > > > e104 print_map
> > > > epoch 104
> > > > flags 0
> > > > created 2014-11-30 01:58:17.176938
> > > > modified 2015-03-14 06:07:05.683239
> > > > tableserver 0
> > > > root 0
> > > > session_timeout 60
> > > > session_autoclose 300
> > > > max_file_size 1099511627776
> > > > last_failure 0
> > > > last_failure_osd_epoch 1760
> > > > compat compat={},rocompat={},incompat={1=base v0.20,2=client
> > > > writeable ranges,3=default file layouts on dirs,4=dir inode
in
> > > > separate object,5=mds uses versioned encoding,6=dirfrag is
stored in
> > > > omap}
> > > > max_mds 1
> > > > in 0
> > > > up {0=59315}
> > > > failed
> > > > stopped
> > > > data_pools 3
> > > > metadata_pool 4
> > > > inline_data disabled
> > > > 59315: 15.12.6.21:6800/26628 'fu' mds.0.21 up:active seq 9
> > > >
> > > > 2015-03-14 07:47:27.837972 7fc04b4437a0 0
mon.fu@-1(probing).osd
> > > > e1768 crush map has features 1107558400, adjusting msgr
requires
> > > > 2015-03-14 07:47:27.837990 7fc04b4437a0 0
mon.fu@-1(probing).osd
> > > > e1768 crush map has features 1107558400, adjusting msgr
requires
> > > > 2015-03-14 07:47:27.837996 7fc04b4437a0 0
mon.fu@-1(probing).osd
> > > > e1768 crush map has features 1107558400, adjusting msgr
requires
> > > > 2015-03-14 07:47:27.838003 7fc04b4437a0 0
mon.fu@-1(probing).osd
> > > > e1768 crush map has features 1107558400, adjusting msgr
requires
> > > > 2015-03-14 07:47:27.839054 7fc04b4437a0 1
> > > > mon.fu@-1(probing).paxosservice(auth 2751..2829) refresh
upgraded,
> > > > format 0 -> 1
> > > > 2015-03-14 07:47:27.840052 7fc04b4437a0 0
mon.fu@-1(probing) e2 my
> > > > rank is now 0 (was -1)
> > > > 2015-03-14 07:47:27.840512 7fc045ef5700 0 --
192.168.1.100:6789/0 >>
> > > > 192.168.1.101:6789/0 pipe(0x3958780 sd=13 :0 s=1 pgs=0 cs=0
l=0
> > > > c=0x38c0dc0).fault
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >> I can no longer start my OSDs :-@
> > > >>
> > > >>
> > > >> failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
> > > >> --name=osd.6
> > > >> --keyring=/var/lib/ceph/osd/ceph-6/keyring osd crush
create-or-move
> > > >> --
> > > >> 6 3.63 host=fu root=default'
> > > >>
> > > >>
> > > >> Please help!!!
> > > >>
> > > >> George
> > > >>
> > > >>> ceph mon add stops at this:
> > > >>>
> > > >>>
> > > >>> [jin][INFO ] Running command: sudo ceph mon getmap -o
> > > >>> /var/lib/ceph/tmp/ceph.raijin.monmap
> > > >>>
> > > >>>
> > > >>> and never gets over it!!!!!
> > > >>>
> > > >>>
> > > >>> Any help??
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>>
> > > >>> George
> > > >>>
> > > >>>> Guyn any help much appreciated because my cluster is down
:-(
> > > >>>>
> > > >>>> After trying ceph mon add which didn't complete since it
was stuck
> > > >>>> for ever here:
> > > >>>>
> > > >>>> [jin][WARNIN] 2015-03-14 07:07:14.964265 7fb4be6f5700 0
> > > >>>> monclient:
> > > >>>> hunting for new mon
> > > >>>> ^CKilled by signal 2.
> > > >>>> [ceph_deploy][ERROR ] KeyboardInterrupt
> > > >>>>
> > > >>>>
> > > >>>> the previously healthy node is now down completely :-(
> > > >>>>
> > > >>>> $ ceph mon stat
> > > >>>> 2015-03-14 07:21:37.782360 7ff2545b1700 0 --
> > > >>>> 192.168.1.100:0/1042061
> > > >>>> >> 192.168.1.101:6789/0 pipe(0x7ff248000c00 sd=4 :0 s=1
pgs=0 cs=0
> > > >>>> l=1
> > > >>>> c=0x7ff248000e90).fault
> > > >>>> ^CError connecting to cluster: InterruptedOrTimeoutError
> > > >>>>
> > > >>>>
> > > >>>> Any ideas??
> > > >>>>
> > > >>>>
> > > >>>> All the best,
> > > >>>>
> > > >>>> George
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> Georgeos
> > > >>>>>
> > > >>>>> , you need to have "deployment server" and cd into
folder that
> > > >>>>> you
> > > >>>>> used originaly while deploying CEPH - in this folder you
should
> > > >>>>> already have ceph.conf, admin.client keyring and other
stuff -
> > > >>>>> which
> > > >>>>> is required to to connect to cluster...and provision new
MONs or
> > > >>>>> OSDs,
> > > >>>>> etc.
> > > >>>>>
> > > >>>>> Message:
> > > >>>>> [ceph_deploy][ERROR ] RuntimeError: mon keyring not
found; run
> > > >>>>> new to
> > > >>>>> create a new cluster...
> > > >>>>>
> > > >>>>> ...means (if Im not mistaken) that you are runnign
ceph-deploy
> > > >>>>> from
> > > >>>>> NOT original folder...
> > > >>>>>
> > > >>>>> On 13 March 2015 at 23:03, Georgios Dimitrakakis wrote:
> > > >>>>>
> > > >>>>>> Not a firewall problem!! Firewall is disabled ...
> > > >>>>>>
> > > >>>>>> Loic I ve tried mon create because of this:
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > >
http://ceph.com/docs/v0.80.5/start/quick-ceph-deploy/#adding-monitors
> > > >>>>>> [4]
> > > >>>>>>
> > > >>>>>> Should I first create and then add?? What is the proper
order???
> > > >>>>>> Should I do it from the already existing monitor node
or can I
> > > >>>>>> run
> > > >>>>>> it from the new one?
> > > >>>>>>
> > > >>>>>> If I try add from the beginning I am getting this:
> > > >>>>>>
> > > >>>>>> ceph_deploy.conf][DEBUG ] found configuration file at:
> > > >>>>>> /home/.cephdeploy.conf
> > > >>>>>> [ceph_deploy.cli][INFO ] Invoked (1.5.22):
/usr/bin/ceph-deploy
> > > >>>>>> mon add jin
> > > >>>>>> [ceph_deploy][ERROR ] RuntimeError: mon keyring not
found; run
> > > >>>>>> new
> > > >>>>>> to create a new cluster
> > > >>>>>>
> > > >>>>>> Regards,
> > > >>>>>>
> > > >>>>>> George
> > > >>>>>>
> > > >>>>>>> Hi,
> > > >>>>>>>
> > > >>>>>>> I think ceph-deploy mon add (instead of create) is
what you
> > > >>>>>>> should be using.
> > > >>>>>>>
> > > >>>>>>> Cheers
> > > >>>>>>>
> > > >>>>>>> On 13/03/2015 22:25, Georgios Dimitrakakis wrote:
> > > >>>>>>>
> > > >>>>>>>> On an already available cluster I ve tried to add a
new
> > > >>>>>>>> monitor!
> > > >>>>>>>>
> > > >>>>>>>> I have used ceph-deploy mon create {NODE}
> > > >>>>>>>>
> > > >>>>>>>> where {NODE}=the name of the node
> > > >>>>>>>>
> > > >>>>>>>> and then I restarted the /etc/init.d/ceph service
with a
> > > >>>>>>>> success at the node
> > > >>>>>>>> where it showed that the monitor is running like:
> > > >>>>>>>>
> > > >>>>>>>> # /etc/init.d/ceph restart
> > > >>>>>>>> === mon.jin ===
> > > >>>>>>>> === mon.jin ===
> > > >>>>>>>> Stopping Ceph mon.jin on jin...kill 36388...done
> > > >>>>>>>> === mon.jin ===
> > > >>>>>>>> Starting Ceph mon.jin on jin...
> > > >>>>>>>> Starting ceph-create-keys on jin...
> > > >>>>>>>>
> > > >>>>>>>> But checking the quorum it doesnt show the newly
added
> > > >>>>>>>> monitor!
> > > >>>>>>>>
> > > >>>>>>>> Plus ceph mon stat gives out only 1 monitor!!!
> > > >>>>>>>>
> > > >>>>>>>> # ceph mon stat
> > > >>>>>>>> e1: 1 mons at {fu=MAILSCANNER WARNING: NUMERICAL
LINKS ARE
> > > >>>>>>>> OFTEN MALICIOUS: 192.168.1.100:6789/0 [1]}, election
epoch 1,
> > > >>>>>>>> quorum 0 fu
> > > >>>>>>>>
> > > >>>>>>>> Any ideas on what have I done wrong???
> > > >>>>>>>>
> > > >>>>>>>> Regards,
> > > >>>>>>>>
> > > >>>>>>>> George
> > > >>>>>>>> _______________________________________________
> > > >>>>>>>> ceph-users mailing list
> > > >>>>>>>> ceph-users@xxxxxxxxxxxxxx [2]
> > > >>>>>>>>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
> > > >>>>>> _______________________________________________
> > > >>>>>> ceph-users mailing list
> > > >>>>>> ceph-users@xxxxxxxxxxxxxx [5]
> > > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[6]
> > > >>>> _______________________________________________
> > > >>>> ceph-users mailing list
> > > >>>> ceph-users@xxxxxxxxxxxxxx
> > > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >>> _______________________________________________
> > > >>> ceph-users mailing list
> > > >>> ceph-users@xxxxxxxxxxxxxx
> > > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >> _______________________________________________
> > > >> ceph-users mailing list
> > > >> ceph-users@xxxxxxxxxxxxxx
> > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > ceph-users@xxxxxxxxxxxxxx
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >