hi wido, after adding the hosts back to monmap the following error occurs in ceph-mon log. e5 ms_verify_authorizer bad authorizer from mon 10.111.73.3:6789/0 i tried to copy the mon key ring to all other nodes, but porblem still exists. kind regards Ben > Benjamin Naber <der-coder@xxxxxxxxxxxxxx> hat am 26. Juli 2018 um 12:29 geschrieben: > > hi Wido, > > i have now one monitor online. i hve removed the two others from monmap. > how can i procedure, to reset that mon hosts and add them as new monitors to the monmap? > > king regards > > Ben > > > Wido den Hollander <wido@xxxxxxxx> hat am 26. Juli 2018 um 11:52 geschrieben: > > > > > > > > > > On 07/26/2018 11:50 AM, Benjamin Naber wrote: > > > hi Wido, > > > > > > got the folowing outputt since ive changed the debug setting: > > > > > > > This is only debug_ms it seems? > > > > debug_mon = 10 > > debug_ms = 10 > > > > Those two shoud be set where debug_mon will tell more about the election > > process. > > > > Wido > > > > > 2018-07-26 11:46:21.004490 7f819e968700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71 > > > cs=1 l=1)._try_send sent bytes 9 remaining bytes 0 > > > 2018-07-26 11:46:21.004520 7f81a196e700 10 -- 10.111.73.1:6789/0 > > > dispatch_throttle_release 60 to dispatch throttler 60/104857600 > > > 2018-07-26 11:46:23.058057 7f81a4173700 1 -- 10.111.73.1:6789/0 >> > > > 10.111.73.2:0/3994280291 conn(0x55aa46c46000 :6789 s=STATE_OPEN pgs=77 > > > cs=1 l=1).mark_down > > > 2018-07-26 11:46:23.058084 7f81a4173700 2 -- 10.111.73.1:6789/0 >> > > > 10.111.73.2:0/3994280291 conn(0x55aa46c46000 :6789 s=STATE_OPEN pgs=77 > > > cs=1 l=1)._stop > > > 2018-07-26 11:46:23.058094 7f81a4173700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.2:0/3994280291 conn(0x55aa46c46000 :6789 s=STATE_OPEN pgs=77 > > > cs=1 l=1).discard_out_queue started > > > 2018-07-26 11:46:23.058120 7f81a4173700 1 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71 > > > cs=1 l=1).mark_down > > > 2018-07-26 11:46:23.058131 7f81a4173700 2 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71 > > > cs=1 l=1)._stop > > > 2018-07-26 11:46:23.058143 7f81a4173700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46c4a800 :6789 s=STATE_OPEN pgs=71 > > > cs=1 l=1).discard_out_queue started > > > 2018-07-26 11:46:23.962796 7f819d966700 10 Processor -- accept listen_fd=22 > > > 2018-07-26 11:46:23.962845 7f819d966700 10 Processor -- accept accepted > > > incoming on sd 23 > > > 2018-07-26 11:46:23.962858 7f819d966700 10 -- 10.111.73.1:6789/0 >> - > > > conn(0x55aa46afd800 :-1 s=STATE_NONE pgs=0 cs=0 l=0).accept sd=23 > > > 2018-07-26 11:46:23.962929 7f819e167700 1 -- 10.111.73.1:6789/0 >> - > > > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING pgs=0 cs=0 > > > l=0)._process_connection sd=23 - > > > 2018-07-26 11:46:23.963022 7f819e167700 10 -- 10.111.73.1:6789/0 >> - > > > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING pgs=0 cs=0 l=0)._try_send > > > sent bytes 281 remaining bytes 0 > > > 2018-07-26 11:46:23.963045 7f819e167700 10 -- 10.111.73.1:6789/0 >> - > > > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0 > > > l=0)._process_connection write banner and addr done: - > > > 2018-07-26 11:46:23.963091 7f819e167700 10 -- 10.111.73.1:6789/0 >> - > > > conn(0x55aa46afd800 :6789 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0 > > > l=0)._process_connection accept peer addr is 10.111.73.1:0/1745436331 > > > 2018-07-26 11:46:23.963190 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > > > l=1)._process_connection accept of host_type 8, policy.lossy=1 > > > policy.server=1 policy.standby=0 policy.resetcheck=0 > > > 2018-07-26 11:46:23.963216 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > > > l=1).handle_connect_msg accept my proto 15, their proto 15 > > > 2018-07-26 11:46:23.963232 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > > > l=1).handle_connect_msg accept setting up session_security. > > > 2018-07-26 11:46:23.963248 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > > > l=1).handle_connect_msg accept new session > > > 2018-07-26 11:46:23.963256 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=87 cs=1 > > > l=1).handle_connect_msg accept success, connect_seq = 1 in_seq=0, > > > sending READY > > > 2018-07-26 11:46:23.963264 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=87 cs=1 > > > l=1).handle_connect_msg accept features 4611087853745930235 > > > 2018-07-26 11:46:23.963315 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=87 cs=1 l=1)._try_send sent > > > bytes 34 remaining bytes 0 > > > 2018-07-26 11:46:23.963356 7f819e167700 2 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_ACCEPTING_WAIT_SEQ pgs=87 cs=1 l=1).handle_connect_msg accept > > > write reply msg done > > > 2018-07-26 11:46:23.963442 7f819e167700 2 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_ACCEPTING_WAIT_SEQ pgs=87 cs=1 l=1)._process_connection accept > > > get newly_acked_seq 0 > > > 2018-07-26 11:46:23.963461 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_ACCEPTING_WAIT_SEQ pgs=87 cs=1 l=1).discard_requeued_up_to 0 > > > 2018-07-26 11:46:23.963634 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_OPEN_KEEPALIVE2 pgs=87 cs=1 l=1)._append_keepalive_or_ack > > > 2018-07-26 11:46:23.963658 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_OPEN_MESSAGE_THROTTLE_BYTES pgs=87 cs=1 l=1).process wants 60 > > > bytes from policy throttler 120/104857600 > > > 2018-07-26 11:46:23.963679 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=87 cs=1 l=1).process > > > aborted = 0 > > > 2018-07-26 11:46:23.963705 7f819e167700 5 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 > > > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=87 cs=1 l=1). rx > > > client.? seq 1 0x55aa46be4480 auth(proto 0 30 bytes epoch 0) v1 > > > 2018-07-26 11:46:23.963750 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 s=STATE_OPEN pgs=87 > > > cs=1 l=1).handle_write > > > 2018-07-26 11:46:23.963755 7f81a196e700 1 -- 10.111.73.1:6789/0 <== > > > client.? 10.111.73.1:0/1745436331 1 ==== auth(proto 0 30 bytes epoch 0) > > > v1 ==== 60+0+0 (4135352935 0 0) 0x55aa46be4480 con 0x55aa46afd800 > > > 2018-07-26 11:46:23.963808 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.1:0/1745436331 conn(0x55aa46afd800 :6789 s=STATE_OPEN pgs=87 > > > cs=1 l=1)._try_send sent bytes 9 remaining bytes 0 > > > 2018-07-26 11:46:23.963823 7f81a196e700 10 -- 10.111.73.1:6789/0 > > > dispatch_throttle_release 60 to dispatch throttler 60/104857600 > > > 2018-07-26 11:46:24.003866 7f819d966700 10 Processor -- accept listen_fd=22 > > > 2018-07-26 11:46:24.003902 7f819d966700 10 Processor -- accept accepted > > > incoming on sd 26 > > > 2018-07-26 11:46:24.003911 7f819d966700 10 -- 10.111.73.1:6789/0 >> - > > > conn(0x55aa46bc1000 :-1 s=STATE_NONE pgs=0 cs=0 l=0).accept sd=26 > > > 2018-07-26 11:46:24.004001 7f819e167700 1 -- 10.111.73.1:6789/0 >> - > > > conn(0x55aa46bc1000 :6789 s=STATE_ACCEPTING pgs=0 cs=0 > > > l=0)._process_connection sd=26 - > > > 2018-07-26 11:46:24.004057 7f819e167700 10 -- 10.111.73.1:6789/0 >> - > > > conn(0x55aa46bc1000 :6789 s=STATE_ACCEPTING pgs=0 cs=0 l=0)._try_send > > > sent bytes 281 remaining bytes 0 > > > 2018-07-26 11:46:24.004071 7f819e167700 10 -- 10.111.73.1:6789/0 >> - > > > conn(0x55aa46bc1000 :6789 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0 > > > l=0)._process_connection write banner and addr done: - > > > 2018-07-26 11:46:24.004199 7f819e167700 10 -- 10.111.73.1:6789/0 >> - > > > conn(0x55aa46bc1000 :6789 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0 > > > l=0)._process_connection accept peer addr is 10.111.73.3:0/1033315403 > > > 2018-07-26 11:46:24.004286 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > > > l=1)._process_connection accept of host_type 8, policy.lossy=1 > > > policy.server=1 policy.standby=0 policy.resetcheck=0 > > > 2018-07-26 11:46:24.004304 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > > > l=1).handle_connect_msg accept my proto 15, their proto 15 > > > 2018-07-26 11:46:24.004319 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > > > l=1).handle_connect_msg accept setting up session_security. > > > 2018-07-26 11:46:24.004338 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > > > l=1).handle_connect_msg accept new session > > > 2018-07-26 11:46:24.004351 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=74 cs=1 > > > l=1).handle_connect_msg accept success, connect_seq = 1 in_seq=0, > > > sending READY > > > 2018-07-26 11:46:24.004365 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=74 cs=1 > > > l=1).handle_connect_msg accept features 4611087853745930235 > > > 2018-07-26 11:46:24.004463 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=74 cs=1 l=1)._try_send sent > > > bytes 34 remaining bytes 0 > > > 2018-07-26 11:46:24.004489 7f819e167700 2 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_ACCEPTING_WAIT_SEQ pgs=74 cs=1 l=1).handle_connect_msg accept > > > write reply msg done > > > 2018-07-26 11:46:24.004634 7f819e167700 2 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_ACCEPTING_WAIT_SEQ pgs=74 cs=1 l=1)._process_connection accept > > > get newly_acked_seq 0 > > > 2018-07-26 11:46:24.004650 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_ACCEPTING_WAIT_SEQ pgs=74 cs=1 l=1).discard_requeued_up_to 0 > > > 2018-07-26 11:46:24.004807 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_OPEN_KEEPALIVE2 pgs=74 cs=1 l=1)._append_keepalive_or_ack > > > 2018-07-26 11:46:24.004828 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_OPEN_MESSAGE_THROTTLE_BYTES pgs=74 cs=1 l=1).process wants 60 > > > bytes from policy throttler 180/104857600 > > > 2018-07-26 11:46:24.004847 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74 cs=1 l=1).process > > > aborted = 0 > > > 2018-07-26 11:46:24.004873 7f819e167700 5 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 > > > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74 cs=1 l=1). rx > > > client.? seq 1 0x55aa46be4fc0 auth(proto 0 30 bytes epoch 0) v1 > > > 2018-07-26 11:46:24.004914 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 s=STATE_OPEN pgs=74 > > > cs=1 l=1).handle_write > > > 2018-07-26 11:46:24.004921 7f81a196e700 1 -- 10.111.73.1:6789/0 <== > > > client.? 10.111.73.3:0/1033315403 1 ==== auth(proto 0 30 bytes epoch 0) > > > v1 ==== 60+0+0 (2547518125 0 0) 0x55aa46be4fc0 con 0x55aa46bc1000 > > > 2018-07-26 11:46:24.004954 7f819e167700 10 -- 10.111.73.1:6789/0 >> > > > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 s=STATE_OPEN pgs=74 > > > cs=1 l=1)._try_send sent bytes 9 remaining bytes 0 > > > 2018-07-26 11:46:24.004965 7f81a196e700 10 -- 10.111.73.1:6789/0 > > > dispatch_throttle_release 60 to dispatch throttler 60/104857600 > > > > > > kind regards > > > > > > ben > > > > > >> Wido den Hollander <wido@xxxxxxxx> hat am 26. Juli 2018 um 11:07 > > > geschrieben: > > >> > > >> > > >> > > >> > > >> On 07/26/2018 10:33 AM, Benjamin Naber wrote: > > >> > hi Wido, > > >> > > > >> > thx for your reply. > > >> > time is also in sync. i forced time sync again to be sure. > > >> > > > >> > > >> Try setting debug_mon to 10 or even 20 and check the logs about what the > > >> MONs are saying. > > >> > > >> debug_ms = 10 might also help to get some more information about the > > >> Messenger Traffic. > > >> > > >> Wido > > >> > > >> > kind regards > > >> > > > >> > Ben > > >> > > > >> >> Wido den Hollander <wido@xxxxxxxx> hat am 26. Juli 2018 um 10:18 > > >> > geschrieben: > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> On 07/26/2018 10:12 AM, Benjamin Naber wrote: > > >> >> > Hi together, > > >> >> > > > >> >> > we currently have some problems with monitor quorum after shutting > > >> > down all cluster nodes for migration to another location. > > >> >> > > > >> >> > mon_status gives uns the following outputt: > > >> >> > > > >> >> > { > > >> >> > "name": "mon01", > > >> >> > "rank": 0, > > >> >> > "state": "electing", > > >> >> > "election_epoch": 20345, > > >> >> > "quorum": [], > > >> >> > "features": { > > >> >> > "required_con": "153140804152475648", > > >> >> > "required_mon": [ > > >> >> > "kraken", > > >> >> > "luminous" > > >> >> > ], > > >> >> > "quorum_con": "0", > > >> >> > "quorum_mon": [] > > >> >> > }, > > >> >> > "outside_quorum": [], > > >> >> > "extra_probe_peers": [], > > >> >> > "sync_provider": [], > > >> >> > "monmap": { > > >> >> > "epoch": 1, > > >> >> > "fsid": "c1e3c489-67a4-47a2-a3ca-98816d1c9d44", > > >> >> > "modified": "2018-06-21 13:48:58.796939", > > >> >> > "created": "2018-06-21 13:48:58.796939", > > >> >> > "features": { > > >> >> > "persistent": [ > > >> >> > "kraken", > > >> >> > "luminous" > > >> >> > ], > > >> >> > "optional": [] > > >> >> > }, > > >> >> > "mons": [ > > >> >> > { > > >> >> > "rank": 0, > > >> >> > "name": "mon01", > > >> >> > "addr": "10.111.73.1:6789/0", > > >> >> > "public_addr": "10.111.73.1:6789/0" > > >> >> > }, > > >> >> > { > > >> >> > "rank": 1, > > >> >> > "name": "mon02", > > >> >> > "addr": "10.111.73.2:6789/0", > > >> >> > "public_addr": "10.111.73.2:6789/0" > > >> >> > }, > > >> >> > { > > >> >> > "rank": 2, > > >> >> > "name": "mon03", > > >> >> > "addr": "10.111.73.3:6789/0", > > >> >> > "public_addr": "10.111.73.3:6789/0" > > >> >> > } > > >> >> > ] > > >> >> > }, > > >> >> > "feature_map": { > > >> >> > "mon": { > > >> >> > "group": { > > >> >> > "features": "0x3ffddff8eea4fffb", > > >> >> > "release": "luminous", > > >> >> > "num": 1 > > >> >> > } > > >> >> > } > > >> >> > } > > >> >> > } > > >> >> > > > >> >> > ceph ping mon.id gives us also just dosent work. monitoring nodes > > >> > have full network connectivity. firewall rules are also ok. > > >> >> > > > >> >> > what cloud be the reson for stucking quorum election ? > > >> >> > > > >> >> > > >> >> Is the time in sync between the nodes? > > >> >> > > >> >> Wido > > >> >> > > >> >> > kind regards > > >> >> > > > >> >> > Ben > > >> >> > _______________________________________________ > > >> >> > ceph-users mailing list > > >> >> > ceph-users@xxxxxxxxxxxxxx > > >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >> >> > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___________________________________________________ Benjamin Naber • Holzstraße 7 • D-73650 Winterbach Mobil: +49 (0) 152.34087809 E-Mail: benjamin.naber@xxxxxxxxxxxxxx ___________________________________________________ Diese E-mail einschließlich eventuell angehängter Dateien enthält vertrauliche und / oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind und diese E-mail irrtümlich erhalten haben, dürfen Sie weder den Inhalt dieser E-mail nutzen noch dürfen Sie die eventuell angehängten Dateien öffnen und auch keine Kopie fertigen oder den Inhalt weitergeben / verbreiten. Bitte verständigen Sie den Absender und löschen Sie diese E-mail und eventuell angehängte Dateien umgehend. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com