Re: Can't join new mon - lossy channel, failing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This line bothers me:

[v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted)

May be it is good idea to run mon under strace and see why your network does not permit the frame read? msgr2 will show the message you have referred to in case if no data is actually received from the network.

Regards,
Vladimir

On 5 October 2021 12:27:10 am AEDT, Konstantin Shalygin <k0ste@xxxxxxxx> wrote:
>Hi,
>
>I was make a mkfs for new mon, but mon stuck on probing. On debug I see: fault on lossy channel, failing. This is a bad (lossy) network (crc mismatch)?
>
>
>2021-10-04 16:22:24.707 7f5952761700 10 mon.mon2@-1(probing) e10 probing other monitors
>2021-10-04 16:22:24.707 7f5952761700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- ?+0 0x5602864cd480
>2021-10-04 16:22:24.707 7f5952761700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- 0x5602864cd480 con 0x560285455600
>2021-10-04 16:22:24.707 7f5952761700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- ?+0 0x5602893ffc00
>2021-10-04 16:22:24.707 7f5952761700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- 0x5602893ffc00 con 0x560285455a80
>2021-10-04 16:22:24.707 7f5952761700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- ?+0 0x560288e98a00
>2021-10-04 16:22:24.707 7f5952761700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- 0x560288e98a00 con 0x5602862d8000
>2021-10-04 16:22:24.707 7f594ff5c700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] <== mon.1 v2:10.40.0.83:3300/0 581 ==== mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-03 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7 ==== 504+0+0 (crc 0 0 0) 0x560287a94f00 con 0x560285455a80
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-03 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe_reply mon.1 v2:10.40.0.83:3300/0 mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-03 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10  monmap is e10: 3 mons at {ceph-01=[v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0],ceph-03=[v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0],ceph-06=[v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0]}
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10  peer name is ceph-03
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10  existing quorum 0,1,2
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10  peer paxos version 127723840 vs my version 127723835 (ok)
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10  ready to join, but i'm not in the monmap or my addr is blank, trying to join
>2021-10-04 16:22:24.707 7f594ff5c700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- ?+0 0x5602864001c0
>2021-10-04 16:22:24.707 7f594ff5c700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- 0x5602864001c0 con 0x560285455600
>2021-10-04 16:22:24.707 7f594ff5c700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] <== mon.2 v2:10.40.0.86:3300/0 574 ==== mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-06 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7 ==== 504+0+0 (crc 0 0 0) 0x56028aa25480 con 0x5602862d8000
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-06 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe_reply mon.2 v2:10.40.0.86:3300/0 mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-06 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10  monmap is e10: 3 mons at {ceph-01=[v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0],ceph-03=[v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0],ceph-06=[v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0]}
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10  peer name is ceph-06
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10  existing quorum 0,1,2
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10  peer paxos version 127723840 vs my version 127723835 (ok)
>2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10  ready to join, but i'm not in the monmap or my addr is blank, trying to join
>2021-10-04 16:22:24.707 7f594ff5c700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- ?+0 0x560286400400
>2021-10-04 16:22:24.707 7f594ff5c700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- 0x560286400400 con 0x560285455600
>2021-10-04 16:22:24.779 7f594cf56700  1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).connect
>2021-10-04 16:22:24.779 7f594cf56700  1 -- 10.40.0.82:0/9719 --> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] -- mgropen(unknown.mon2) v3 -- 0x56028541d900 con 0x560287e40000
>2021-10-04 16:22:24.779 7f5953f64700  1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0
>2021-10-04 16:22:24.779 7f5953f64700 10 mon.mon2@-1(probing) e10 ms_get_authorizer for mgr
>2021-10-04 16:22:24.779 7f5953f64700  1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).ready entity=mgr.62450337 client_cookie=5a76b276e3a3deca server_cookie=0 in_seq=0 out_seq=0
>2021-10-04 16:22:24.779 7f5953f64700  1 -- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 msgr2=0x560287e56000 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 28
>2021-10-04 16:22:24.779 7f5953f64700  1 -- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 msgr2=0x560287e56000 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
>2021-10-04 16:22:24.779 7f594c755700  1 -- 10.40.0.82:0/9719 <== mgr.62450337 v2:10.40.0.81:6898/2507925 1 ==== mgrconfigure(period=5, threshold=5) v3 ==== 12+0+0 (crc 0 0 0) 0x560287dd3a20 con 0x560287e40000
>2021-10-04 16:22:24.779 7f5953f64700  1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted)
>2021-10-04 16:22:24.779 7f5953f64700  1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).stop
>2021-10-04 16:22:24.779 7f594c755700  1 -- 10.40.0.82:0/9719 --> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] -- mgrreport(unknown.mon2 +100-0 packed 1174 task_status=0) v8 -- 0x5602860a9880 con 0x560287e40000
>2021-10-04 16:22:24.779 7f594c755700  1 -- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 msgr2=0x560287e56000 unknown :-1 s=STATE_CLOSED l=1).mark_down
>2021-10-04 16:22:24.779 7f594c755700  1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 unknown :-1 s=CLOSED pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).stop
>2021-10-04 16:22:24.839 7f594ef5a700  1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >>  conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).accept
>2021-10-04 16:22:24.839 7f5953f64700  1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >>  conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0
>2021-10-04 16:22:24.839 7f5953f64700  1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >>  conn(0x560287aac880 msgr2=0x5602875c4a00 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe
>2021-10-04 16:22:24.839 7f5953f64700  1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >>  conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).write hello frame write failed r=-32 ((32) Broken pipe)
>2021-10-04 16:22:24.839 7f5953f64700  1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >>  conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).stop
>2021-10-04 16:22:24.839 7f594ff5c700 10 mon.mon2@-1(probing) e10 ms_handle_reset 0x560287aac880
>
>
>
>Thanks,
>k
>_______________________________________________
>Dev mailing list -- dev@xxxxxxx
>To unsubscribe send an email to dev-leave@xxxxxxx

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux