Enabled futher more debug_mon and debug_ms at maximum. For mee those logs seems fine, at least at the network connection level. However it seems the mgr is not authorizing the mon for any reason do you agree? debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 bootstrap debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 sync_reset_requester debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 unregister_cluster_logger - not registered debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 cancel_probe_timeout (none scheduled) debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 monmap e16: 3 mons at {controller1=[v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0],controller4=[v2:192.168.9.209:3300/0,v1:192.168.9.209:6789/0],controller5=[v2:192.168.9.210:3300/0,v1:192.168.9.210:6789/0]} debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 _reset debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing).auth v47053 _set_mon_num_rank num 0 rank 0 debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 cancel_probe_timeout (none scheduled) debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 timecheck_finish debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 15 mon.controller2@-1(probing) e16 health_tick_stop debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 15 mon.controller2@-1(probing) e16 health_interval_stop debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 scrub_event_cancel debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 scrub_reset debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 cancel_probe_timeout (none scheduled) debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 reset_probe_timeout 0x55e468a8ac20 after 2 seconds debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 probing other monitors debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] send_to--> mon [v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0] -- mon_probe(probe e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller2 leader -1 new mon_release pacific) v8 -- ?+0 0x55e471b39a00 debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] --> [v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0] -- mon_probe(probe e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller2 leader -1 new mon_release pacific) v8 -- 0x55e471b39a00 con 0x55e45e6cac00 debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] send_to--> mon [v2:192.168.9.209:3300/0,v1:192.168.9.209:6789/0] -- mon_probe(probe e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller2 leader -1 new mon_release pacific) v8 -- ?+0 0x55e475389c00 debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] --> [v2:192.168.9.209:3300/0,v1:192.168.9.209:6789/0] -- mon_probe(probe e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller2 leader -1 new mon_release pacific) v8 -- 0x55e475389c00 con 0x55e45e6cb000 debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] send_to--> mon [v2:192.168.9.210:3300/0,v1:192.168.9.210:6789/0] -- mon_probe(probe e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller2 leader -1 new mon_release pacific) v8 -- ?+0 0x55e473e66400 debug 2022-04-01T11:01:46.574+0000 7ff8048b5700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] --> [v2:192.168.9.210:3300/0,v1:192.168.9.210:6789/0] -- mon_probe(probe e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller2 leader -1 new mon_release pacific) v8 -- 0x55e473e66400 con 0x55e45ea18800 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] <== mon.2 v2:192.168.9.210:3300/0 4661 ==== mon_probe(reply e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller5 quorum 0,1,2 leader 0 paxos( fc 134362058 lc 134362749 ) mon_release pacific) v8 ==== 562+0+0 (secure 0 0 0) 0x55e473e66400 con 0x55e45ea18800 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 20 mon.controller2@-1(probing) e16 _ms_dispatch existing session 0x55e45e5c38c0 for mon.2 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 20 mon.controller2@-1(probing) e16 entity_name global_id 0 (none) caps allow * debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 20 is_capable service=mon command= read addr v2:192.168.9.210:3300/0 on cap allow * debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 20 allow so far , doing grant allow * debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 20 allow all debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 handle_probe mon_probe(reply e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller5 quorum 0,1,2 leader 0 paxos( fc 134362058 lc 134362749 ) mon_release pacific) v8 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 handle_probe_reply mon.2 v2:192.168.9.210:3300/0 mon_probe(reply e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller5 quorum 0,1,2 leader 0 paxos( fc 134362058 lc 134362749 ) mon_release pacific) v8 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 monmap is e16: 3 mons at {controller1=[v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0],controller4=[v2:192.168.9.209:3300/0,v1:192.168.9.209:6789/0],controller5=[v2:192.168.9.210:3300/0,v1:192.168.9.210:6789/0]} debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 peer name is controller5 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 existing quorum 0,1,2 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 peer paxos version 134362749 vs my version 134362745 (ok) debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 ready to join, but i'm not in the monmap/my addr is blank/location is wrong, trying to join debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] send_to--> mon [v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0] -- mon_join(controller2 [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] {}) v3 -- ?+0 0x55e472810b40 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] --> [v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0] -- mon_join(controller2 [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] {}) v3 -- 0x55e472810b40 con 0x55e45e6cac00 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] <== mon.1 v2:192.168.9.209:3300/0 4496 ==== mon_probe(reply e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller4 quorum 0,1,2 leader 0 paxos( fc 134362058 lc 134362749 ) mon_release pacific) v8 ==== 562+0+0 (secure 0 0 0) 0x55e475389c00 con 0x55e45e6cb000 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 20 mon.controller2@-1(probing) e16 _ms_dispatch existing session 0x55e45e5c3b00 for mon.1 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 20 mon.controller2@-1(probing) e16 entity_name global_id 0 (none) caps allow * debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 20 is_capable service=mon command= read addr v2:192.168.9.209:3300/0 on cap allow * debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 20 allow so far , doing grant allow * debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 20 allow all debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 handle_probe mon_probe(reply e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller4 quorum 0,1,2 leader 0 paxos( fc 134362058 lc 134362749 ) mon_release pacific) v8 debug 2022-04-01T11:01:46.574+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 handle_probe_reply mon.1 v2:192.168.9.209:3300/0 mon_probe(reply e248e2e0-0db7-454c-9cfd-7ff1cce99786 name controller4 quorum 0,1,2 leader 0 paxos( fc 134362058 lc 134362749 ) mon_release pacific) v8 debug 2022-04-01T11:01:46.575+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 monmap is e16: 3 mons at {controller1=[v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0],controller4=[v2:192.168.9.209:3300/0,v1:192.168.9.209:6789/0],controller5=[v2:192.168.9.210:3300/0,v1:192.168.9.210:6789/0]} debug 2022-04-01T11:01:46.575+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 peer name is controller4 debug 2022-04-01T11:01:46.575+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 existing quorum 0,1,2 debug 2022-04-01T11:01:46.575+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 peer paxos version 134362749 vs my version 134362745 (ok) debug 2022-04-01T11:01:46.575+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 ready to join, but i'm not in the monmap/my addr is blank/location is wrong, trying to join debug 2022-04-01T11:01:46.575+0000 7ff8020b0700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] send_to--> mon [v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0] -- mon_join(controller2 [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] {}) v3 -- ?+0 0x55e472810d20 debug 2022-04-01T11:01:46.575+0000 7ff8020b0700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] --> [v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0] -- mon_join(controller2 [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] {}) v3 -- 0x55e472810d20 con 0x55e45e6cac00 debug 2022-04-01T11:01:47.287+0000 7ff8010ae700 1 --1- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] >> conn(0x55e45fb14000 0x55e4640f5000 :6789 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=18 legacy v1:192.168.9.207:6789/0 socket_addr v1:192.168.9.207:6789/0 target_addr v1:192.168.9.209:50566/0 debug 2022-04-01T11:01:47.288+0000 7ff8010ae700 10 mon.controller2@-1(probing) e16 handle_auth_request con 0x55e45fb14000 (start) method 0 payload 0 debug 2022-04-01T11:01:47.288+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 ms_handle_accept con 0x55e45fb14000 no session debug 2022-04-01T11:01:47.288+0000 7ff8020b0700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] <== client.? v1:192.168.9.209:0/3013530895 1 ==== auth(proto 0 39 bytes epoch 0) v1 ==== 69+0+0 (unknown 634677387 0 0) 0x55e45ed4db00 con 0x55e45fb14000 debug 2022-04-01T11:01:47.288+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 _ms_dispatch new session 0x55e47283c6c0 MonSession(client.? v1:192.168.9.209:0/3013530895 is open , features 0x3f01cfb9fffdffff (luminous)) features 0x3f01cfb9fffdffff debug 2022-04-01T11:01:47.288+0000 7ff8020b0700 20 mon.controller2@-1(probing) e16 entity_name global_id 0 (none) caps debug 2022-04-01T11:01:47.288+0000 7ff8020b0700 5 mon.controller2@-1(probing) e16 waitlisting message auth(proto 0 39 bytes epoch 0) v1 debug 2022-04-01T11:01:47.289+0000 7ff8010ae700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] >> v1:192.168.9.209:0/3013530895 conn(0x55e45fb14000 legacy=0x55e4640f5000 unknown :6789 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 18 debug 2022-04-01T11:01:47.289+0000 7ff8010ae700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] >> v1:192.168.9.209:0/3013530895 conn(0x55e45fb14000 legacy=0x55e4640f5000 unknown :6789 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed debug 2022-04-01T11:01:47.289+0000 7ff8010ae700 1 --1- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] >> v1:192.168.9.209:0/3013530895 conn(0x55e45fb14000 0x55e4640f5000 :6789 s=OPENED pgs=2 cs=1 l=1).handle_message read tag failed debug 2022-04-01T11:01:47.289+0000 7ff8010ae700 1 --1- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] >> v1:192.168.9.209:0/3013530895 conn(0x55e45fb14000 0x55e4640f5000 :6789 s=OPENED pgs=2 cs=1 l=1).fault on lossy channel, failing debug 2022-04-01T11:01:47.289+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 ms_handle_reset 0x55e45fb14000 v1:192.168.9.209:0/3013530895 debug 2022-04-01T11:01:47.289+0000 7ff8010ae700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] reap_dead start debug 2022-04-01T11:01:47.289+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 reset/close on session client.? v1:192.168.9.209:0/3013530895 debug 2022-04-01T11:01:47.289+0000 7ff8020b0700 10 mon.controller2@-1(probing) e16 remove_session 0x55e47283c6c0 client.? v1:192.168.9.209:0/3013530895 features 0x3f01cfb9fffdffff debug 2022-04-01T11:01:47.525+0000 7ff7ff0aa700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).connect debug 2022-04-01T11:01:47.525+0000 7ff7ff0aa700 1 -- 192.168.9.207:0/2682856153 --> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] -- mgropen(unknown.controller2) v3 -- 0x55e45d815680 con 0x55e45e8f5000 debug 2022-04-01T11:01:47.525+0000 7ff8010ae700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0 debug 2022-04-01T11:01:47.525+0000 7ff8010ae700 10 mon.controller2@-1(probing) e16 get_authorizer for mgr debug 2022-04-01T11:01:47.526+0000 7ff8010ae700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 secure :-1 s=READY pgs=764161 cs=0 l=1 rev1=1 rx=0x55e46029cf90 tx=0x55e45e8fd200).ready entity=mgr.145817085 client_cookie=bff4d2e044b0a87c server_cookie=0 in_seq=0 out_seq=0 debug 2022-04-01T11:01:47.526+0000 7ff8010ae700 1 -- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 msgr2=0x55e45e772500 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 18 debug 2022-04-01T11:01:47.526+0000 7ff8010ae700 1 -- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 msgr2=0x55e45e772500 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed debug 2022-04-01T11:01:47.526+0000 7ff7fe8a9700 1 -- 192.168.9.207:0/2682856153 <== mgr.145817085 v2:192.168.9.208:6834/1694443943 1 ==== mgrconfigure(period=5, threshold=5) v4 ==== 13+0+0 (secure 0 0 0) 0x55e461a42e00 con 0x55e45e8f5000 debug 2022-04-01T11:01:47.526+0000 7ff8010ae700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 secure :-1 s=READY pgs=764161 cs=0 l=1 rev1=1 rx=0x55e46029cf90 tx=0x55e45e8fd200).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted) debug 2022-04-01T11:01:47.526+0000 7ff8010ae700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 secure :-1 s=READY pgs=764161 cs=0 l=1 rev1=1 rx=0x55e46029cf90 tx=0x55e45e8fd200).stop debug 2022-04-01T11:01:47.526+0000 7ff7fe8a9700 1 -- 192.168.9.207:0/2682856153 --> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] -- mgrreport(unknown.controller2 +98-0 packed 1158 task_status=0) v9 -- 0x55e471b3a380 con 0x55e45e8f5000 debug 2022-04-01T11:01:47.526+0000 7ff7fe8a9700 1 -- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 msgr2=0x55e45e772500 unknown :-1 s=STATE_CLOSED l=1).mark_down debug 2022-04-01T11:01:47.526+0000 7ff7fe8a9700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 unknown :-1 s=CLOSED pgs=764161 cs=0 l=1 rev1=1 rx=0 tx=0).stop debug 2022-04-01T11:01:48.054+0000 7ff8048b5700 11 mon.controller2@-1(probing) e16 tick debug 2022-04-01T11:01:48.054+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 trimming session 0x55e45e8f4400 client.? because we've been out of quorum too long debug 2022-04-01T11:01:48.054+0000 7ff8048b5700 1 -- [v2:192.168.9.207:3300/0,v1:192.168.9.207:6789/0] >> v1:192.168.9.207:0/170763801 conn(0x55e45e8f4400 legacy=0x55e46ec8b800 unknown :6789 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down debug 2022-04-01T11:01:48.054+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 remove_session 0x55e47283c480 client.? v1:192.168.9.207:0/170763801 features 0x3f01cfb8ffedffff debug 2022-04-01T11:01:48.054+0000 7ff8048b5700 20 mon.controller2@-1(probing) e16 sync_trim_providers debug 2022-04-01T11:01:48.054+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 session closed, dropping 0x55e45f2eed80 debug 2022-04-01T11:01:48.054+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 session closed, dropping 0x55e45ed4db00 debug 2022-04-01T11:01:48.525+0000 7ff7ff0aa700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).connect debug 2022-04-01T11:01:48.525+0000 7ff7ff0aa700 1 -- 192.168.9.207:0/2682856153 --> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] -- mgropen(unknown.controller2) v3 -- 0x55e45d815440 con 0x55e45e8f5000 debug 2022-04-01T11:01:48.525+0000 7ff8060b8700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0 debug 2022-04-01T11:01:48.525+0000 7ff8060b8700 10 mon.controller2@-1(probing) e16 get_authorizer for mgr debug 2022-04-01T11:01:48.526+0000 7ff8060b8700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 secure :-1 s=READY pgs=764162 cs=0 l=1 rev1=1 rx=0x55e46198c3f0 tx=0x55e45e8a5bc0).ready entity=mgr.145817085 client_cookie=b845d9dc02af57cd server_cookie=0 in_seq=0 out_seq=0 debug 2022-04-01T11:01:48.526+0000 7ff7fe8a9700 1 -- 192.168.9.207:0/2682856153 <== mgr.145817085 v2:192.168.9.208:6834/1694443943 1 ==== mgrconfigure(period=5, threshold=5) v4 ==== 13+0+0 (secure 0 0 0) 0x55e45f2e0c40 con 0x55e45e8f5000 debug 2022-04-01T11:01:48.526+0000 7ff8060b8700 1 -- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 msgr2=0x55e45e772500 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 16 debug 2022-04-01T11:01:48.526+0000 7ff8060b8700 1 -- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 msgr2=0x55e45e772500 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed debug 2022-04-01T11:01:48.526+0000 7ff8060b8700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 secure :-1 s=READY pgs=764162 cs=0 l=1 rev1=1 rx=0x55e46198c3f0 tx=0x55e45e8a5bc0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted) debug 2022-04-01T11:01:48.526+0000 7ff8060b8700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 secure :-1 s=READY pgs=764162 cs=0 l=1 rev1=1 rx=0x55e46198c3f0 tx=0x55e45e8a5bc0).stop debug 2022-04-01T11:01:48.526+0000 7ff7fe8a9700 1 -- 192.168.9.207:0/2682856153 --> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] -- mgrreport(unknown.controller2 +98-0 packed 1158 daemon_metrics=1 task_status=0) v9 -- 0x55e471b3a380 con 0x55e45e8f5000 debug 2022-04-01T11:01:48.526+0000 7ff7fe8a9700 1 -- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 msgr2=0x55e45e772500 unknown :-1 s=STATE_CLOSED l=1).mark_down debug 2022-04-01T11:01:48.526+0000 7ff7fe8a9700 1 --2- 192.168.9.207:0/2682856153 >> [v2:192.168.9.208:6834/1694443943,v1:192.168.9.208:6835/1694443943] conn(0x55e45e8f5000 0x55e45e772500 unknown :-1 s=CLOSED pgs=764162 cs=0 l=1 rev1=1 rx=0 tx=0).stop debug 2022-04-01T11:01:48.574+0000 7ff8048b5700 4 mon.controller2@-1(probing) e16 probe_timeout 0x55e468a8ac20 debug 2022-04-01T11:01:48.574+0000 7ff8048b5700 10 mon.controller2@-1(probing) e16 bootstrap On the mgr it tries to use I can see the following (debug disabled again): debug 2022-04-01T11:17:28.260+0000 7fe860f7c700 0 log_channel(cluster) log [DBG] : pgmap v339236: 753 pgs: 3 active+clean+scrubbing+deep, 750 active+clean; 13 TiB data, 51 TiB used, 42 TiB / 94 TiB avail; 6.9 MiB/s rd, 2.3 MiB/s wr, 1.50k op/s debug 2022-04-01T11:17:28.646+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:28.647+0000 7fe989559700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory debug 2022-04-01T11:17:29.029+0000 7fe861f7e700 0 log_channel(audit) log [DBG] : from='client.141447932 -' entity='client.iscsi.iscsi.d3ovirt1.yvkyzk' cmd=[{"prefix": "service status", "format": "json"}]: dispatch debug 2022-04-01T11:17:29.646+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:29.647+0000 7fe988d58700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory debug 2022-04-01T11:17:30.262+0000 7fe860f7c700 0 log_channel(cluster) log [DBG] : pgmap v339237: 753 pgs: 3 active+clean+scrubbing+deep, 750 active+clean; 13 TiB data, 51 TiB used, 42 TiB / 94 TiB avail; 8.5 MiB/s rd, 1.5 MiB/s wr, 1.98k op/s debug 2022-04-01T11:17:30.646+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:30.647+0000 7fe989559700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory debug 2022-04-01T11:17:31.650+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:31.651+0000 7fe988d58700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory debug 2022-04-01T11:17:32.264+0000 7fe860f7c700 0 log_channel(cluster) log [DBG] : pgmap v339238: 753 pgs: 3 active+clean+scrubbing+deep, 750 active+clean; 13 TiB data, 51 TiB used, 42 TiB / 94 TiB avail; 8.6 MiB/s rd, 1.5 MiB/s wr, 1.50k op/s debug 2022-04-01T11:17:32.647+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:32.648+0000 7fe989559700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory debug 2022-04-01T11:17:33.647+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:33.647+0000 7fe988d58700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory debug 2022-04-01T11:17:33.932+0000 7fe861f7e700 0 log_channel(audit) log [DBG] : from='client.144629755 -' entity='client.iscsi.iscsi.d3ovirt2.aovbcb' cmd=[{"prefix": "service status", "format": "json"}]: dispatch debug 2022-04-01T11:17:34.267+0000 7fe860f7c700 0 log_channel(cluster) log [DBG] : pgmap v339239: 753 pgs: 3 active+clean+scrubbing+deep, 750 active+clean; 13 TiB data, 51 TiB used, 42 TiB / 94 TiB avail; 10 MiB/s rd, 1.8 MiB/s wr, 1.53k op/s debug 2022-04-01T11:17:34.647+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:34.648+0000 7fe989559700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory 192.168.9.206 - - [01/Apr/2022:11:17:35] "GET /metrics HTTP/1.1" 200 358194 "" "Prometheus/2.24.1" debug 2022-04-01T11:17:35.647+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:35.648+0000 7fe988d58700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory debug 2022-04-01T11:17:36.269+0000 7fe860f7c700 0 log_channel(cluster) log [DBG] : pgmap v339240: 753 pgs: 3 active+clean+scrubbing+deep, 750 active+clean; 13 TiB data, 51 TiB used, 42 TiB / 94 TiB avail; 11 MiB/s rd, 1.9 MiB/s wr, 2.01k op/s debug 2022-04-01T11:17:36.647+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:36.648+0000 7fe989559700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory 192.168.9.211 - - [01/Apr/2022:11:17:37] "GET /metrics HTTP/1.1" 200 358194 "" "Prometheus/2.18.1" debug 2022-04-01T11:17:37.648+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:37.649+0000 7fe988d58700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory debug 2022-04-01T11:17:38.271+0000 7fe860f7c700 0 log_channel(cluster) log [DBG] : pgmap v339241: 753 pgs: 3 active+clean+scrubbing+deep, 750 active+clean; 13 TiB data, 51 TiB used, 42 TiB / 94 TiB avail; 11 MiB/s rd, 2.4 MiB/s wr, 1.52k op/s debug 2022-04-01T11:17:38.647+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:38.648+0000 7fe989559700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory debug 2022-04-01T11:17:39.032+0000 7fe861f7e700 0 log_channel(audit) log [DBG] : from='client.141447932 -' entity='client.iscsi.iscsi.d3ovirt1.yvkyzk' cmd=[{"prefix": "service status", "format": "json"}]: dispatch debug 2022-04-01T11:17:39.649+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:39.649+0000 7fe988d58700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory debug 2022-04-01T11:17:40.273+0000 7fe860f7c700 0 log_channel(cluster) log [DBG] : pgmap v339242: 753 pgs: 3 active+clean+scrubbing+deep, 750 active+clean; 13 TiB data, 51 TiB used, 42 TiB / 94 TiB avail; 12 MiB/s rd, 2.4 MiB/s wr, 2.06k op/s debug 2022-04-01T11:17:40.649+0000 7fe861f7e700 2 mgr.server handle_open ignoring open from mon.controller2 192.168.9.207:0/2682856153; not ready for session (expect reconnect) debug 2022-04-01T11:17:40.650+0000 7fe989559700 1 mgr finish mon failed to return metadata for mon.controller2: (2) No such file or directory Kind Regards Thomas Bruckmann Systemadministrator Cloud Dienste E Thomas.Bruckmann@xxxxxxxxxxxxx<mailto:%20Thomas.Bruckmann@xxxxxxxxxxxxx;> softgarden e-recruiting GmbH Tauentzienstra?e 14 | 10789 Berlin https://softgarden.com<https://softgarden.com/de>/de Gesellschaft mit beschr?nkter Haftung, Amtsgericht Berlin-Charlottenburg HRB 114159 B | USt-ID: DE260440441 | Gesch?ftsf?hrer: Mathias Heese, Stefan Sch?ffler, Claus M?ller Von: Thomas Bruckmann <Thomas.Bruckmann@xxxxxxxxxxxxx> Datum: Freitag, 1. April 2022 um 12:26 An: Konstantin Shalygin <k0ste@xxxxxxxx> Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx> Betreff: AW: Ceph Mon not able to authenticate Hi Konstantin, Thank you for your reply! Yes, we meanwhile doublechecked that the network is working correctly, those nodes are also used as K8S workers and all the containers etc. is running correctly, also the OSD, the MGR's and the MDS Containers are running fine. The only problem are the mon containers. And those mons are not coming up on the nodes we use since years as mons, its no problem to deploy a mon to any of the other K8S worker node in the same subnet. I also enabled debug logs level 20 and I cannot find any reason or any hint in those logs, why this mons are not able to join. Interestingly, it is not only on one machine, it is on all old mon machines, until the mon container is redeployed or restarted, it does not come up again. And a hardware error on 3 machines the same time is quiet unlikely. I also fully redeployed the mon containers and ensured after removing, that absolutely no artifacts are still on the machine, so that nothing is left in /var/lib/ceph/<id>/mon.* /var/run/ceph/<id> /var/lib/ceph/<id>/crash The only directories mounted to the container where I did not delete files are /dev, /udev and /var/log/ceph/<id>. Additionally I removed the stopped ceph containers before redeploying. I have no idea how I could more remove a ceph mon from a machine ? Hope you or someone else may still has an Idea or at least a direction, even if you have any hint what exactly to check at the network stack of the server, everything is welcome. Kind Regards, Thomas Bruckmann Systemadministrator Cloud Dienste E Thomas.Bruckmann@xxxxxxxxxxxxx<mailto:%20Thomas.Bruckmann@xxxxxxxxxxxxx;> softgarden e-recruiting GmbH Tauentzienstra?e 14 | 10789 Berlin https://softgarden.com<https://softgarden.com/de>/de Gesellschaft mit beschr?nkter Haftung, Amtsgericht Berlin-Charlottenburg HRB 114159 B | USt-ID: DE260440441 | Gesch?ftsf?hrer: Mathias Heese, Stefan Sch?ffler, Claus M?ller Von: Konstantin Shalygin <k0ste@xxxxxxxx> Datum: Mittwoch, 30. M?rz 2022 um 10:05 An: Thomas Bruckmann <Thomas.Bruckmann@xxxxxxxxxxxxx> Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx> Betreff: Re: Ceph Mon not able to authenticate Hi, You are not first with this issue If you are on 146% sure that is not a network (arp, ip, mtu, firewall) issue - I suggest to remove this mon and deploy it again. Or deploy on another (unused) ipaddr Also, you can add --debug_ms=20 and you should see some "lossy channel" messages before quorum join fails k > On 29 Mar 2022, at 15:20, Thomas Bruckmann <Thomas.Bruckmann@xxxxxxxxxxxxx> wrote: > > Hello again, > increased the Debug level now to a maximum for the mons and I still have no idea what the problem could be. > > So I just print the Debug Log of the Mon failing to join here, in hope, someone could help me. In addition, it seems the mon not joining, stays quiet long in the probing phase, sometimes it switches to synchronizing, which seems to work and after that its back on probing. > > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 bootstrap > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 sync_reset_requester > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 unregister_cluster_logger - not registered > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 cancel_probe_timeout (none scheduled) > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 monmap e16: 3 mons at {controller1=[v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0],controller4=[v2:192.168.9.209:3300/0,v1:192.168.9.209:6789/0],controller5=[v2:192.168.9.210:3300/0,v1:192.168.9.210:6789/0]} > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 _reset > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing).auth v46972 _set_mon_num_rank num 0 rank 0 > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 cancel_probe_timeout (none scheduled) > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 timecheck_finish > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 15 mon.controller2@-1(probing) e16 health_tick_stop > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 15 mon.controller2@-1(probing) e16 health_interval_stop > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 scrub_event_cancel > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 scrub_reset > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 cancel_probe_timeout (none scheduled) > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 reset_probe_timeout 0x55c46fbb8d80 after 2 seconds > debug 2022-03-29T11:10:53.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 probing other monitors > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 20 mon.controller2@-1(probing) e16 _ms_dispatch existing session 0x55c46f8d4900 for mon.2 > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 20 mon.controller2@-1(probing) e16 entity_name global_id 0 (none) caps allow * > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 20 is_capable service=mon command= read addr v2:192.168.9.210:3300/0 on cap allow * > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 20 allow so far , doing grant allow * > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 20 allow all > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 handle_probe mon_probe(reply 9d036488-fb4f-4e5b-85ec-4ccf75501b48 name controller5 quorum 0,1,2 leader 0 paxos( fc 133912517 lc 133913211 ) mon_release pacific) v8 > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 handle_probe_reply mon.2 v2:192.168.9.210:3300/0 mon_probe(reply 9d036488-fb4f-4e5b-85ec-4ccf75501b48 name controller5 quorum 0,1,2 leader 0 paxos( fc 133912517 lc 133913211 ) mon_release pacific) v8 > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 monmap is e16: 3 mons at {controller1=[v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0],controller4=[v2:192.168.9.209:3300/0,v1:192.168.9.209:6789/0],controller5=[v2:192.168.9.210:3300/0,v1:192.168.9.210:6789/0]} > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 peer name is controller5 > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 existing quorum 0,1,2 > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 peer paxos version 133913211 vs my version 133913204 (ok) > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 ready to join, but i'm not in the monmap/my addr is blank/location is wrong, trying to join > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 20 mon.controller2@-1(probing) e16 _ms_dispatch existing session 0x55c46f8d4b40 for mon.1 > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 20 mon.controller2@-1(probing) e16 entity_name global_id 0 (none) caps allow * > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 20 is_capable service=mon command= read addr v2:192.168.9.209:3300/0 on cap allow * > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 20 allow so far , doing grant allow * > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 20 allow all > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 handle_probe mon_probe(reply 9d036488-fb4f-4e5b-85ec-4ccf75501b48 name controller4 quorum 0,1,2 leader 0 paxos( fc 133912517 lc 133913211 ) mon_release pacific) v8 > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 handle_probe_reply mon.1 v2:192.168.9.209:3300/0 mon_probe(reply 9d036488-fb4f-4e5b-85ec-4ccf75501b48 name controller4 quorum 0,1,2 leader 0 paxos( fc 133912517 lc 133913211 ) mon_release pacific) v8 > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 monmap is e16: 3 mons at {controller1=[v2:192.168.9.206:3300/0,v1:192.168.9.206:6789/0],controller4=[v2:192.168.9.209:3300/0,v1:192.168.9.209:6789/0],controller5=[v2:192.168.9.210:3300/0,v1:192.168.9.210:6789/0]} > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 peer name is controller4 > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 existing quorum 0,1,2 > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 peer paxos version 133913211 vs my version 133913204 (ok) > debug 2022-03-29T11:10:53.695+0000 7f81be00c700 10 mon.controller2@-1(probing) e16 ready to join, but i'm not in the monmap/my addr is blank/location is wrong, trying to join > debug 2022-03-29T11:10:54.453+0000 7f81c2014700 10 mon.controller2@-1(probing) e16 get_authorizer for mgr > debug 2022-03-29T11:10:55.453+0000 7f81c2014700 10 mon.controller2@-1(probing) e16 get_authorizer for mgr > debug 2022-03-29T11:10:55.695+0000 7f81c0811700 4 mon.controller2@-1(probing) e16 probe_timeout 0x55c46fbb8d80 > debug 2022-03-29T11:10:55.695+0000 7f81c0811700 10 mon.controller2@-1(probing) e16 bootstrap > > Kind Regards, > Thomas Bruckmann > Systemadministrator Cloud Dienste > E > Thomas.Bruckmann@xxxxxxxxxxxxx<mailto:%20Thomas.Bruckmann@xxxxxxxxxxxxx;> > softgarden e-recruiting GmbH > Tauentzienstra?e 14 | 10789 Berlin > https://softgarden.de/ > Gesellschaft mit beschr?nkter Haftung, Amtsgericht Berlin-Charlottenburg > HRB 114159 B | USt-ID: DE260440441 | Gesch?ftsf?hrer: Mathias Heese, Stefan Sch?ffler, Claus M?ller > > > Von: Thomas Bruckmann <Thomas.Bruckmann@xxxxxxxxxxxxx> > Datum: Donnerstag, 24. M?rz 2022 um 17:06 > An: ceph-users@xxxxxxx <ceph-users@xxxxxxx> > Betreff: Ceph Mon not able to authenticate > Hello, > We are running ceph 16.2.6 and having trouble with our mon's everything is managed via ceph orch and running in containers. Since we switched our firewall in the DC (which also makes DNS) our ceph mon daemons are not able to authenticate when they are restarted. > > The errormessage in the monitor log is: > > debug 2022-03-24T14:25:12.716+0000 7fa0dc2df700 1 mon.2@-1(probing) e13 handle_auth_request failed to assign global_id > > What we already tried to solve the problem: > > * Removed the mon fully from the node (including all artifacts in the FS) > * Doublechecked if the mon is still in the monmap after removing it (it is not) > * Added other mons (which were previously no mons) to ensure a unique and synced monmap and tried adding the failing mon -> no success > * Shutted down a running mon (no one of the brand new) and tried bringing it up again -> same error > > It seems not to be an error with the monmap, however manipulating the monmap manually is currently not possible, since the system is prod and we cannot shutdown the whole FS. > > Another Blogpost, I do not find the link anymore, say the problem could be related to the dns resolution somehow, that may the dns name behind the IP has changed. For each of our initial mons, we have 3 different DNS names, which are returned on a reverse lookup, since we switched the Firewall, may to order those names are returned has changed. Don't know if this could be to problem. > > Does may anyone has an Idea how to solve the Problem? > > Kind Regards, > Thomas Bruckmann > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx