Re: {Disarmed} Re: Adding Monitor

Georgios Dimitrakakis <giorgis@xxxxxxxxxxxx> · Sat, 14 Mar 2015 01:48:35 +0200

Yes Sage!

Priority is to fix things!

Right now I don't have a healthy monitor!

Can I remove all of them and add the first one from scratch?

What would that mean about the data??

Best,

George

On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote:
This is the message that is flooding the ceph-mon.log now:

 2015-03-14 08:16:39.286823 7f9f6920b700  1
 mon.fu@0(electing).elector(1) init, last seen epoch 1
 2015-03-14 08:16:42.736674 7f9f6880a700  1 mon.fu@0(electing) e2
 adding peer 15.12.6.21:6789/0 to list of hints
 2015-03-14 08:16:42.737891 7f9f6880a700  1
 mon.fu@0(electing).elector(1) discarding election message:
 15.12.6.21:6789/0
 not in my monmap e2: 2 mons at
 {fu=192.168.1.100:6789/0,jin=192.168.1.101:6789/0}

It sounds like you need to follow some variation of this procedure:

http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

..although it may be that simply killing the daemon running on 
15.12.6.21
and restarting the other mon daemons will be enough.  If not, the
procedure linked above will let tyou remove all traces of it and get
things up again.

Not quite sure where things went awry but I assume the priority is to 
get
things working first and figure that out later!

sage

 George

> This is the log for monitor (ceph-mon.log) when I try to restart 
the
> monitor:
>
>
> 2015-03-14 07:47:26.384561 7f1f1dc0f700 -1 mon.fu@0(probing) e2 
***
> Got Signal Terminated ***
> 2015-03-14 07:47:26.384593 7f1f1dc0f700  1 mon.fu@0(probing) e2
> shutdown
> 2015-03-14 07:47:26.384654 7f1f1dc0f700  0 quorum service shutdown
> 2015-03-14 07:47:26.384657 7f1f1dc0f700  0
> mon.fu@0(shutdown).health(0) HealthMonitor::service_shutdown 1
> services
> 2015-03-14 07:47:26.384665 7f1f1dc0f700  0 quorum service shutdown
> 2015-03-14 07:47:27.620670 7fc04b4437a0  0 ceph version 0.80.9
> (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-mon, pid
> 17050
> 2015-03-14 07:47:27.703151 7fc04b4437a0  0 starting mon.fu rank 0 
at
> 192.168.1.100:6789/0 mon_data /var/lib/ceph/mon/ceph-fu fsid
> a1132ec2-7104-4e8e-a3d5-95965cae9138
> 2015-03-14 07:47:27.703421 7fc04b4437a0  1 mon.fu@-1(probing) e2
> preinit fsid a1132ec2-7104-4e8e-a3d5-95965cae9138
> 2015-03-14 07:47:27.704504 7fc04b4437a0  1
> mon.fu@-1(probing).paxosservice(pgmap 897493..898204) refresh
> upgraded, format 0 -> 1
> 2015-03-14 07:47:27.704525 7fc04b4437a0  1 mon.fu@-1(probing).pg 
v0
> on_upgrade discarding in-core PGMap
> 2015-03-14 07:47:27.837060 7fc04b4437a0  0 mon.fu@-1(probing).mds
> e104 print_map
> epoch	104
> flags	0
> created	2014-11-30 01:58:17.176938
> modified	2015-03-14 06:07:05.683239
> tableserver	0
> root	0
> session_timeout	60
> session_autoclose	300
> max_file_size	1099511627776
> last_failure	0
> last_failure_osd_epoch	1760
> compat	compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in
> separate object,5=mds uses versioned encoding,6=dirfrag is stored 
in
> omap}
> max_mds	1
> in	0
> up	{0=59315}
> failed
> stopped
> data_pools	3
> metadata_pool	4
> inline_data	disabled
> 59315:	15.12.6.21:6800/26628 'fu' mds.0.21 up:active seq 9
>
> 2015-03-14 07:47:27.837972 7fc04b4437a0  0 mon.fu@-1(probing).osd
> e1768 crush map has features 1107558400, adjusting msgr requires
> 2015-03-14 07:47:27.837990 7fc04b4437a0  0 mon.fu@-1(probing).osd
> e1768 crush map has features 1107558400, adjusting msgr requires
> 2015-03-14 07:47:27.837996 7fc04b4437a0  0 mon.fu@-1(probing).osd
> e1768 crush map has features 1107558400, adjusting msgr requires
> 2015-03-14 07:47:27.838003 7fc04b4437a0  0 mon.fu@-1(probing).osd
> e1768 crush map has features 1107558400, adjusting msgr requires
> 2015-03-14 07:47:27.839054 7fc04b4437a0  1
> mon.fu@-1(probing).paxosservice(auth 2751..2829) refresh upgraded,
> format 0 -> 1
> 2015-03-14 07:47:27.840052 7fc04b4437a0  0 mon.fu@-1(probing) e2  
my
> rank is now 0 (was -1)
> 2015-03-14 07:47:27.840512 7fc045ef5700  0 -- 192.168.1.100:6789/0 
>>
> 192.168.1.101:6789/0 pipe(0x3958780 sd=13 :0 s=1 pgs=0 cs=0 l=0
> c=0x38c0dc0).fault
>
>
>
>
>
>> I can no longer start my OSDs :-@
>>
>>
>> failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
>> --name=osd.6
>> --keyring=/var/lib/ceph/osd/ceph-6/keyring osd crush 
create-or-move
>> --
>> 6 3.63 host=fu root=default'
>>
>>
>> Please help!!!
>>
>> George
>>
>>> ceph mon add stops at this:
>>>
>>>
>>> [jin][INFO  ] Running command: sudo ceph mon getmap -o
>>> /var/lib/ceph/tmp/ceph.raijin.monmap
>>>
>>>
>>> and never gets over it!!!!!
>>>
>>>
>>> Any help??
>>>
>>> Thanks,
>>>
>>>
>>> George
>>>
>>>> Guyn any help much appreciated because my cluster is down :-(
>>>>
>>>> After trying ceph mon add which didn't complete since it was 
stuck
>>>> for ever here:
>>>>
>>>> [jin][WARNIN] 2015-03-14 07:07:14.964265 7fb4be6f5700  0
>>>> monclient:
>>>> hunting for new mon
>>>> ^CKilled by signal 2.
>>>> [ceph_deploy][ERROR ] KeyboardInterrupt
>>>>
>>>>
>>>> the previously healthy node is now down completely :-(
>>>>
>>>> $ ceph mon stat
>>>> 2015-03-14 07:21:37.782360 7ff2545b1700  0 --
>>>> 192.168.1.100:0/1042061
>>>> >> 192.168.1.101:6789/0 pipe(0x7ff248000c00 sd=4 :0 s=1 pgs=0 
cs=0
>>>> l=1
>>>> c=0x7ff248000e90).fault
>>>> ^CError connecting to cluster: InterruptedOrTimeoutError
>>>>
>>>>
>>>> Any ideas??
>>>>
>>>>
>>>> All the best,
>>>>
>>>> George
>>>>
>>>>
>>>>
>>>>> Georgeos
>>>>>
>>>>> , you need to have "deployment server" and cd into folder that
>>>>> you
>>>>> used originaly while deploying CEPH - in this folder you 
should
>>>>> already have ceph.conf, admin.client keyring and other stuff -
>>>>> which
>>>>> is required to to connect to cluster...and provision new MONs 
or
>>>>> OSDs,
>>>>> etc.
>>>>>
>>>>> Message:
>>>>> [ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run
>>>>> new to
>>>>> create a new cluster...
>>>>>
>>>>> ...means (if Im not mistaken) that you are runnign ceph-deploy
>>>>> from
>>>>> NOT original folder...
>>>>>
>>>>> On 13 March 2015 at 23:03, Georgios Dimitrakakis  wrote:
>>>>>
>>>>>> Not a firewall problem!! Firewall is disabled ...
>>>>>>
>>>>>> Loic I ve tried mon create because of this:
>>>>>>
>>>>>
>>>>> 
http://ceph.com/docs/v0.80.5/start/quick-ceph-deploy/#adding-monitors
>>>>>> [4]
>>>>>>
>>>>>> Should I first create and then add?? What is the proper 
order???
>>>>>> Should I do it from the already existing monitor node or can 
I
>>>>>> run
>>>>>> it from the new one?
>>>>>>
>>>>>> If I try add from the beginning I am getting this:
>>>>>>
>>>>>> ceph_deploy.conf][DEBUG ] found configuration file at:
>>>>>> /home/.cephdeploy.conf
>>>>>> [ceph_deploy.cli][INFO  ] Invoked (1.5.22): 
/usr/bin/ceph-deploy
>>>>>> mon add jin
>>>>>> [ceph_deploy][ERROR ] RuntimeError: mon keyring not found; 
run
>>>>>> new
>>>>>> to create a new cluster
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> George
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I think ceph-deploy mon add (instead of create) is what you
>>>>>>> should be using.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> On 13/03/2015 22:25, Georgios Dimitrakakis wrote:
>>>>>>>
>>>>>>>> On an already available cluster I ve tried to add a new
>>>>>>>> monitor!
>>>>>>>>
>>>>>>>> I have used ceph-deploy mon create {NODE}
>>>>>>>>
>>>>>>>> where {NODE}=the name of the node
>>>>>>>>
>>>>>>>> and then I restarted the /etc/init.d/ceph service with a
>>>>>>>> success at the node
>>>>>>>> where it showed that the monitor is running like:
>>>>>>>>
>>>>>>>> # /etc/init.d/ceph restart
>>>>>>>> === mon.jin ===
>>>>>>>> === mon.jin ===
>>>>>>>> Stopping Ceph mon.jin on jin...kill 36388...done
>>>>>>>> === mon.jin ===
>>>>>>>> Starting Ceph mon.jin on jin...
>>>>>>>> Starting ceph-create-keys on jin...
>>>>>>>>
>>>>>>>> But checking the quorum it doesnt show the newly added
>>>>>>>> monitor!
>>>>>>>>
>>>>>>>> Plus ceph mon stat gives out only 1 monitor!!!
>>>>>>>>
>>>>>>>> # ceph mon stat
>>>>>>>> e1: 1 mons at {fu=MAILSCANNER WARNING: NUMERICAL LINKS ARE
>>>>>>>> OFTEN MALICIOUS: 192.168.1.100:6789/0 [1]}, election epoch 
1,
>>>>>>>> quorum 0 fu
>>>>>>>>
>>>>>>>> Any ideas on what have I done wrong???
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> George
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> ceph-users@xxxxxxxxxxxxxx [2]
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@xxxxxxxxxxxxxx [5]
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [6]
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com