Hi, I was make a mkfs for new mon, but mon stuck on probing. On debug I see: fault on lossy channel, failing. This is a bad (lossy) network (crc mismatch)? 2021-10-04 16:22:24.707 7f5952761700 10 mon.mon2@-1(probing) e10 probing other monitors 2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- ?+0 0x5602864cd480 2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- 0x5602864cd480 con 0x560285455600 2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- ?+0 0x5602893ffc00 2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- 0x5602893ffc00 con 0x560285455a80 2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- ?+0 0x560288e98a00 2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- 0x560288e98a00 con 0x5602862d8000 2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] <== mon.1 v2:10.40.0.83:3300/0 581 ==== mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-03 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7 ==== 504+0+0 (crc 0 0 0) 0x560287a94f00 con 0x560285455a80 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-03 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe_reply mon.1 v2:10.40.0.83:3300/0 mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-03 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 monmap is e10: 3 mons at {ceph-01=[v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0],ceph-03=[v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0],ceph-06=[v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0]} 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 peer name is ceph-03 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 existing quorum 0,1,2 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 peer paxos version 127723840 vs my version 127723835 (ok) 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 ready to join, but i'm not in the monmap or my addr is blank, trying to join 2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- ?+0 0x5602864001c0 2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- 0x5602864001c0 con 0x560285455600 2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] <== mon.2 v2:10.40.0.86:3300/0 574 ==== mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-06 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7 ==== 504+0+0 (crc 0 0 0) 0x56028aa25480 con 0x5602862d8000 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-06 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe_reply mon.2 v2:10.40.0.86:3300/0 mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-06 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 monmap is e10: 3 mons at {ceph-01=[v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0],ceph-03=[v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0],ceph-06=[v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0]} 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 peer name is ceph-06 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 existing quorum 0,1,2 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 peer paxos version 127723840 vs my version 127723835 (ok) 2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 ready to join, but i'm not in the monmap or my addr is blank, trying to join 2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- ?+0 0x560286400400 2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- 0x560286400400 con 0x560285455600 2021-10-04 16:22:24.779 7f594cf56700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).connect 2021-10-04 16:22:24.779 7f594cf56700 1 -- 10.40.0.82:0/9719 --> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] -- mgropen(unknown.mon2) v3 -- 0x56028541d900 con 0x560287e40000 2021-10-04 16:22:24.779 7f5953f64700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0 2021-10-04 16:22:24.779 7f5953f64700 10 mon.mon2@-1(probing) e10 ms_get_authorizer for mgr 2021-10-04 16:22:24.779 7f5953f64700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).ready entity=mgr.62450337 client_cookie=5a76b276e3a3deca server_cookie=0 in_seq=0 out_seq=0 2021-10-04 16:22:24.779 7f5953f64700 1 -- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 msgr2=0x560287e56000 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 28 2021-10-04 16:22:24.779 7f5953f64700 1 -- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 msgr2=0x560287e56000 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed 2021-10-04 16:22:24.779 7f594c755700 1 -- 10.40.0.82:0/9719 <== mgr.62450337 v2:10.40.0.81:6898/2507925 1 ==== mgrconfigure(period=5, threshold=5) v3 ==== 12+0+0 (crc 0 0 0) 0x560287dd3a20 con 0x560287e40000 2021-10-04 16:22:24.779 7f5953f64700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted) 2021-10-04 16:22:24.779 7f5953f64700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).stop 2021-10-04 16:22:24.779 7f594c755700 1 -- 10.40.0.82:0/9719 --> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] -- mgrreport(unknown.mon2 +100-0 packed 1174 task_status=0) v8 -- 0x5602860a9880 con 0x560287e40000 2021-10-04 16:22:24.779 7f594c755700 1 -- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 msgr2=0x560287e56000 unknown :-1 s=STATE_CLOSED l=1).mark_down 2021-10-04 16:22:24.779 7f594c755700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 unknown :-1 s=CLOSED pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).stop 2021-10-04 16:22:24.839 7f594ef5a700 1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >> conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).accept 2021-10-04 16:22:24.839 7f5953f64700 1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >> conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0 2021-10-04 16:22:24.839 7f5953f64700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >> conn(0x560287aac880 msgr2=0x5602875c4a00 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe 2021-10-04 16:22:24.839 7f5953f64700 1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >> conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).write hello frame write failed r=-32 ((32) Broken pipe) 2021-10-04 16:22:24.839 7f5953f64700 1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >> conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).stop 2021-10-04 16:22:24.839 7f594ff5c700 10 mon.mon2@-1(probing) e10 ms_handle_reset 0x560287aac880 Thanks, k _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx