Thanks, this got me back on track. After a lot of trial and error, I found that the problem was, in fact, an authentication issue. All fixed now. :-) Now that I solved it, I figured out that "nuke the monitor's store" was probably referring to /var/lib/ceph/mon/ceph-tc. I was following the directions in the deployment and operations manuals[1,2]. Now I know how monitors work a whole lot better, so that's good. The moral of the story here is: if you think you have an auth problem, check the keyring on disk to see if it matches the other nodes. Thanks, Adam [1] http://docs.ceph.com/docs/master/rados/deployment/ceph-deploy-mon/#remove-a-monitor [2] http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-a-monitor-manual On 6/20/19 11:50 AM, Gregory Farnum wrote: > Just nuke the monitor's store, remove it from the existing quorum, and > start over again. Injecting maps correctly is non-trivial and obviously > something went wrong, and re-syncing a monitor is pretty cheap. > > On Thu, Jun 20, 2019 at 6:46 AM ☣Adam <adam@xxxxxxxxx > <mailto:adam@xxxxxxxxx>> wrote: > > Anyone have any suggestions for how to troubleshoot this issue? > > > -------- Forwarded Message -------- > Subject: Monitor stuck at "probing" > Date: Fri, 14 Jun 2019 21:40:39 -0500 > From: ☣Adam <adam@xxxxxxxxx <mailto:adam@xxxxxxxxx>> > To: ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > > I have a monitor which I just can't seem to get to join the quorum, even > after injecting a monmap from one of the other servers.[1] I use NTP on > all servers and also manually verified the clocks are synchronized. > > > My monitors are named: ceph0, ceph2, xe, and tc > > I'm transitioning away from the ceph# naming scheme, so please forgive > the confusing [lack of a] naming convention. > > > The relevant output from: ceph -s > 1/4 mons down, quorum ceph0,ceph2,xe > mon: 4 daemons, quorum ceph0,ceph2,xe, out of quorum: tc > > > tc is up, bound to the expected IP address, and the ceph-mon service can > be reached from xe, ceph0 and ceph2 using telnet. The mon_host and > mon_initial_members from `ceph daemon mon.tc <http://mon.tc> config > show` look correct. > > mon_status on tc shows the state as "probing" and the list of > "extra_probe_peers" looks correct (correct IP addresses, and ports). > However the monmap section looks wrong. The "mons" has all 4 servers, > but the addr and public_addr values are 0.0.0.0:0 > <http://0.0.0.0:0>. Furthermore it says > the monmap epoch is 4. I don't understand why because I just injected a > monmap which has an epoch of 7. > > Here's the output of: monmaptool --print ./monmap > monmaptool: monmap file ./monmap > epoch 7 > fsid a690e404-3152-4804-a960-8b52abf3bd65 > last_changed 2019-06-02 17:38:50.161035 > created 2018-12-28 20:26:41.443339 > 0: 192.168.60.10:6789/0 <http://192.168.60.10:6789/0> mon.ceph0 > 1: 192.168.60.11:6789/0 <http://192.168.60.11:6789/0> mon.tc > <http://mon.tc> > 2: 192.168.60.12:6789/0 <http://192.168.60.12:6789/0> mon.ceph2 > 3: 192.168.60.53:6789/0 <http://192.168.60.53:6789/0> mon.xe > > When I injected it, I stopped ceph-mon, ran: > sudo ceph-mon -i tc --inject-monmap ./monmap > > and started ceph-mon again. I then rebooted to see if it would fix this > epoch/addr issue. It did not. > > I'm attaching what I believe is the relevant section of my log file from > the tc monitor. I ran `ceph auth list` on tc and ceph2 and verified > that the output is identical. This check was based on what I saw in the > log and what I read in a blog post.[2] > > What are the next steps in troubleshooting this issue? > > > Thanks, > Adam > > > [1] > http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-mon/ > [2] > https://medium.com/@george.shuklin/silly-mistakes-with-ceph-mon-9ef6c9eaab54 > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com