Anyone have any suggestions for how to troubleshoot this issue? -------- Forwarded Message -------- Subject: Monitor stuck at "probing" Date: Fri, 14 Jun 2019 21:40:39 -0500 From: ☣Adam <adam@xxxxxxxxx> To: ceph-users@xxxxxxxxxxxxxx I have a monitor which I just can't seem to get to join the quorum, even after injecting a monmap from one of the other servers.[1] I use NTP on all servers and also manually verified the clocks are synchronized. My monitors are named: ceph0, ceph2, xe, and tc I'm transitioning away from the ceph# naming scheme, so please forgive the confusing [lack of a] naming convention. The relevant output from: ceph -s 1/4 mons down, quorum ceph0,ceph2,xe mon: 4 daemons, quorum ceph0,ceph2,xe, out of quorum: tc tc is up, bound to the expected IP address, and the ceph-mon service can be reached from xe, ceph0 and ceph2 using telnet. The mon_host and mon_initial_members from `ceph daemon mon.tc config show` look correct. mon_status on tc shows the state as "probing" and the list of "extra_probe_peers" looks correct (correct IP addresses, and ports). However the monmap section looks wrong. The "mons" has all 4 servers, but the addr and public_addr values are 0.0.0.0:0. Furthermore it says the monmap epoch is 4. I don't understand why because I just injected a monmap which has an epoch of 7. Here's the output of: monmaptool --print ./monmap monmaptool: monmap file ./monmap epoch 7 fsid a690e404-3152-4804-a960-8b52abf3bd65 last_changed 2019-06-02 17:38:50.161035 created 2018-12-28 20:26:41.443339 0: 192.168.60.10:6789/0 mon.ceph0 1: 192.168.60.11:6789/0 mon.tc 2: 192.168.60.12:6789/0 mon.ceph2 3: 192.168.60.53:6789/0 mon.xe When I injected it, I stopped ceph-mon, ran: sudo ceph-mon -i tc --inject-monmap ./monmap and started ceph-mon again. I then rebooted to see if it would fix this epoch/addr issue. It did not. I'm attaching what I believe is the relevant section of my log file from the tc monitor. I ran `ceph auth list` on tc and ceph2 and verified that the output is identical. This check was based on what I saw in the log and what I read in a blog post.[2] What are the next steps in troubleshooting this issue? Thanks, Adam [1] http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-mon/ [2] https://medium.com/@george.shuklin/silly-mistakes-with-ceph-mon-9ef6c9eaab54
2019-06-14 21:16:29.293 7fa2d6d95700 0 -- 192.168.60.11:6789/0 >> 192.168.60.10:6789/0 conn(0x557135e29500 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer 2019-06-14 21:16:31.213 7fa2d6d95700 0 -- 192.168.60.11:6789/0 >> 192.168.60.10:6789/0 conn(0x557135bfd100 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER 2019-06-14 21:16:31.217 7fa2d7d97700 0 -- 192.168.60.11:6789/0 >> 192.168.60.53:6789/0 conn(0x557135d4e000 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER 2019-06-14 21:16:31.221 7fa2d7596700 0 -- 192.168.60.11:6789/0 >> 192.168.60.12:6789/0 conn(0x557135bfd800 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER 2019-06-14 21:16:32.173 7fa2d6d95700 0 cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_decrypt, 16639705050927474509 != 18374858748799134293 2019-06-14 21:16:32.173 7fa2d6d95700 0 mon.tc@-1(probing) e4 ms_verify_authorizer bad authorizer from mon 192.168.60.12:6789/0 2019-06-14 21:16:32.173 7fa2d6d95700 0 -- 192.168.60.11:6789/0 >> 192.168.60.12:6789/0 conn(0x557135d85c00 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer 2019-06-14 21:16:42.121 7fa2d6d95700 0 cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_decrypt, 16639705050927474509 != 18374858748799134293 2019-06-14 21:16:42.121 7fa2d6d95700 0 mon.tc@-1(probing) e4 ms_verify_authorizer bad authorizer from mon 192.168.60.53:6789/0 2019-06-14 21:16:42.121 7fa2d6d95700 0 -- 192.168.60.11:6789/0 >> 192.168.60.53:6789/0 conn(0x557135d85500 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer 2019-06-14 21:16:42.121 7fa2d6d95700 0 cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_decrypt, 16639705050927474509 != 18374858748799134293 2019-06-14 21:16:42.121 7fa2d6d95700 0 mon.tc@-1(probing) e4 ms_verify_authorizer bad authorizer from mon 192.168.60.53:6789/0 2019-06-14 21:16:42.121 7fa2d6d95700 0 -- 192.168.60.11:6789/0 >> 192.168.60.53:6789/0 conn(0x557135d85500 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer 2019-06-14 21:16:44.293 7fa2d6d95700 0 cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_decrypt, 16639705050927474509 != 18374858748799134293 2019-06-14 21:16:44.293 7fa2d6d95700 0 mon.tc@-1(probing) e4 ms_verify_authorizer bad authorizer from mon 192.168.60.10:6789/0 2019-06-14 21:16:44.293 7fa2d6d95700 0 -- 192.168.60.11:6789/0 >> 192.168.60.10:6789/0 conn(0x557135d85500 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer 2019-06-14 21:16:44.293 7fa2d6d95700 0 cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_decrypt, 16639705050927474509 != 18374858748799134293 2019-06-14 21:16:44.293 7fa2d6d95700 0 mon.tc@-1(probing) e4 ms_verify_authorizer bad authorizer from mon 192.168.60.10:6789/0 2019-06-14 21:16:44.293 7fa2d6d95700 0 -- 192.168.60.11:6789/0 >> 192.168.60.10:6789/0 conn(0x557135d85500 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: got bad authorizer 2019-06-14 21:16:46.213 7fa2d6d95700 0 -- 192.168.60.11:6789/0 >> 192.168.60.10:6789/0 conn(0x557135bfd100 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER 2019-06-14 21:16:46.217 7fa2d7d97700 0 -- 192.168.60.11:6789/0 >> 192.168.60.53:6789/0 conn(0x557135d4e000 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER 2019-06-14 21:16:46.221 7fa2d7596700 0 -- 192.168.60.11:6789/0 >> 192.168.60.12:6789/0 conn(0x557135bfd800 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=0).handle_connect_reply connect got BADAUTHORIZER
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com