Re: Monitor stuck at "probing"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, this got me back on track.  After a lot of trial and error, I
found that the problem was, in fact, an authentication issue.  All fixed
now. :-)

Now that I solved it, I figured out that "nuke the monitor's store" was
probably referring to /var/lib/ceph/mon/ceph-tc.  I was following the
directions in the deployment and operations manuals[1,2].  Now I know
how monitors work a whole lot better, so that's good.

The moral of the story here is: if you think you have an auth problem,
check the keyring on disk to see if it matches the other nodes.

Thanks,
Adam

[1]
http://docs.ceph.com/docs/master/rados/deployment/ceph-deploy-mon/#remove-a-monitor
[2]
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-a-monitor-manual


On 6/20/19 11:50 AM, Gregory Farnum wrote:
> Just nuke the monitor's store, remove it from the existing quorum, and
> start over again. Injecting maps correctly is non-trivial and obviously
> something went wrong, and re-syncing a monitor is pretty cheap.
> 
> On Thu, Jun 20, 2019 at 6:46 AM ☣Adam <adam@xxxxxxxxx
> <mailto:adam@xxxxxxxxx>> wrote:
> 
>     Anyone have any suggestions for how to troubleshoot this issue?
> 
> 
>     -------- Forwarded Message --------
>     Subject: Monitor stuck at "probing"
>     Date: Fri, 14 Jun 2019 21:40:39 -0500
>     From: ☣Adam <adam@xxxxxxxxx <mailto:adam@xxxxxxxxx>>
>     To: ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
> 
>     I have a monitor which I just can't seem to get to join the quorum, even
>     after injecting a monmap from one of the other servers.[1]  I use NTP on
>     all servers and also manually verified the clocks are synchronized.
> 
> 
>     My monitors are named: ceph0, ceph2, xe, and tc
> 
>     I'm transitioning away from the ceph# naming scheme, so please forgive
>     the confusing [lack of a] naming convention.
> 
> 
>     The relevant output from: ceph -s
>     1/4 mons down, quorum ceph0,ceph2,xe
>     mon: 4 daemons, quorum ceph0,ceph2,xe, out of quorum: tc
> 
> 
>     tc is up, bound to the expected IP address, and the ceph-mon service can
>     be reached from xe, ceph0 and ceph2 using telnet.  The mon_host and
>     mon_initial_members from `ceph daemon mon.tc <http://mon.tc> config
>     show` look correct.
> 
>     mon_status on tc shows the state as "probing" and the list of
>     "extra_probe_peers" looks correct (correct IP addresses, and ports).
>     However the monmap section looks wrong.  The "mons" has all 4 servers,
>     but the addr and public_addr values are 0.0.0.0:0
>     <http://0.0.0.0:0>.  Furthermore it says
>     the monmap epoch is 4.  I don't understand why because I just injected a
>     monmap which has an epoch of 7.
> 
>     Here's the output of: monmaptool --print ./monmap
>     monmaptool: monmap file ./monmap
>     epoch 7
>     fsid a690e404-3152-4804-a960-8b52abf3bd65
>     last_changed 2019-06-02 17:38:50.161035
>     created 2018-12-28 20:26:41.443339
>     0: 192.168.60.10:6789/0 <http://192.168.60.10:6789/0> mon.ceph0
>     1: 192.168.60.11:6789/0 <http://192.168.60.11:6789/0> mon.tc
>     <http://mon.tc>
>     2: 192.168.60.12:6789/0 <http://192.168.60.12:6789/0> mon.ceph2
>     3: 192.168.60.53:6789/0 <http://192.168.60.53:6789/0> mon.xe
> 
>     When I injected it, I stopped ceph-mon, ran:
>     sudo ceph-mon -i tc --inject-monmap ./monmap
> 
>     and started ceph-mon again.  I then rebooted to see if it would fix this
>     epoch/addr issue.  It did not.
> 
>     I'm attaching what I believe is the relevant section of my log file from
>     the tc monitor.  I ran `ceph auth list` on tc and ceph2 and verified
>     that the output is identical.  This check was based on what I saw in the
>     log and what I read in a blog post.[2]
> 
>     What are the next steps in troubleshooting this issue?
> 
> 
>     Thanks,
>     Adam
> 
> 
>     [1]
>     http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-mon/
>     [2]
>     https://medium.com/@george.shuklin/silly-mistakes-with-ceph-mon-9ef6c9eaab54
> 
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux