Re: Can't join new mon - lossy channel, failing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This line bothers me:

[v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted)

May be it is good idea to run mon under strace and see why your network does not permit the frame read? msgr2 will show the message you have referred to in case if no data is actually received from the network.

Regards,
Vladimir

On 5 October 2021 12:27:10 am AEDT, Konstantin Shalygin <k0ste@xxxxxxxx> wrote:
Hi,

I was make a mkfs for new mon, but mon stuck on probing. On debug I see: fault on lossy channel, failing. This is a bad (lossy) network (crc mismatch)?


2021-10-04 16:22:24.707 7f5952761700 10 mon.mon2@-1(probing) e10 probing other monitors
2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- ?+0 0x5602864cd480
2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- 0x5602864cd480 con 0x560285455600
2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- ?+0 0x5602893ffc00
2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- 0x5602893ffc00 con 0x560285455a80
2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- ?+0 0x560288e98a00
2021-10-04 16:22:24.707 7f5952761700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0] -- mon_probe(probe 677f4be1-cd98-496d-8b50-1f99df0df670 name mon2 new mon_release 14) v7 -- 0x560288e98a00 con 0x5602862d8000
2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] <== mon.1 v2:10.40.0.83:3300/0 581 ==== mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-03 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7 ==== 504+0+0 (crc 0 0 0) 0x560287a94f00 con 0x560285455a80
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-03 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe_reply mon.1 v2:10.40.0.83:3300/0 mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-03 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 monmap is e10: 3 mons at {ceph-01=[v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0],ceph-03=[v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0],ceph-06=[v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0]}
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 peer name is ceph-03
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 existing quorum 0,1,2
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 peer paxos version 127723840 vs my version 127723835 (ok)
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 ready to join, but i'm not in the monmap or my addr is blank, trying to join
2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- ?+0 0x5602864001c0
2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- 0x5602864001c0 con 0x560285455600
2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] <== mon.2 v2:10.40.0.86:3300/0 574 ==== mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-06 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7 ==== 504+0+0 (crc 0 0 0) 0x56028aa25480 con 0x5602862d8000
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-06 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 handle_probe_reply mon.2 v2:10.40.0.86:3300/0 mon_probe(reply 677f4be1-cd98-496d-8b50-1f99df0df670 name ceph-06 quorum 0,1,2 paxos( fc 127723108 lc 127723840 ) mon_release 14) v7
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 monmap is e10: 3 mons at {ceph-01=[v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0],ceph-03=[v2:10.40.0.83:3300/0,v1:10.40.0.83:6789/0],ceph-06=[v2:10.40.0.86:3300/0,v1:10.40.0.86:6789/0]}
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 peer name is ceph-06
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 existing quorum 0,1,2
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 peer paxos version 127723840 vs my version 127723835 (ok)
2021-10-04 16:22:24.707 7f594ff5c700 10 mon.mon2@-1(probing) e10 ready to join, but i'm not in the monmap or my addr is blank, trying to join
2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] send_to--> mon [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- ?+0 0x560286400400
2021-10-04 16:22:24.707 7f594ff5c700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] --> [v2:10.40.0.81:3300/0,v1:10.40.0.81:6789/0] -- mon_join(mon2 [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0]) v2 -- 0x560286400400 con 0x560285455600
2021-10-04 16:22:24.779 7f594cf56700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).connect
2021-10-04 16:22:24.779 7f594cf56700 1 -- 10.40.0.82:0/9719 --> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] -- mgropen(unknown.mon2) v3 -- 0x56028541d900 con 0x560287e40000
2021-10-04 16:22:24.779 7f5953f64700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0
2021-10-04 16:22:24.779 7f5953f64700 10 mon.mon2@-1(probing) e10 ms_get_authorizer for mgr
2021-10-04 16:22:24.779 7f5953f64700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).ready entity=mgr.62450337 client_cookie=5a76b276e3a3deca server_cookie=0 in_seq=0 out_seq=0
2021-10-04 16:22:24.779 7f5953f64700 1 -- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 msgr2=0x560287e56000 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 28
2021-10-04 16:22:24.779 7f5953f64700 1 -- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 msgr2=0x560287e56000 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2021-10-04 16:22:24.779 7f594c755700 1 -- 10.40.0.82:0/9719 <== mgr.62450337 v2:10.40.0.81:6898/2507925 1 ==== mgrconfigure(period=5, threshold=5) v3 ==== 12+0+0 (crc 0 0 0) 0x560287dd3a20 con 0x560287e40000
2021-10-04 16:22:24.779 7f5953f64700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted)
2021-10-04 16:22:24.779 7f5953f64700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 crc :-1 s=READY pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).stop
2021-10-04 16:22:24.779 7f594c755700 1 -- 10.40.0.82:0/9719 --> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] -- mgrreport(unknown.mon2 +100-0 packed 1174 task_status=0) v8 -- 0x5602860a9880 con 0x560287e40000
2021-10-04 16:22:24.779 7f594c755700 1 -- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 msgr2=0x560287e56000 unknown :-1 s=STATE_CLOSED l=1).mark_down
2021-10-04 16:22:24.779 7f594c755700 1 --2- 10.40.0.82:0/9719 >> [v2:10.40.0.81:6898/2507925,v1:10.40.0.81:6899/2507925] conn(0x560287e40000 0x560287e56000 unknown :-1 s=CLOSED pgs=16872 cs=0 l=1 rev1=1 rx=0 tx=0).stop
2021-10-04 16:22:24.839 7f594ef5a700 1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >> conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).accept
2021-10-04 16:22:24.839 7f5953f64700 1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >> conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0
2021-10-04 16:22:24.839 7f5953f64700 1 -- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >> conn(0x560287aac880 msgr2=0x5602875c4a00 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send send error: (32) Broken pipe
2021-10-04 16:22:24.839 7f5953f64700 1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >> conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).write hello frame write failed r=-32 ((32) Broken pipe)
2021-10-04 16:22:24.839 7f5953f64700 1 --2- [v2:10.40.0.82:3300/0,v1:10.40.0.82:6789/0] >> conn(0x560287aac880 0x5602875c4a00 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).stop
2021-10-04 16:22:24.839 7f594ff5c700 10 mon.mon2@-1(probing) e10 ms_handle_reset 0x560287aac880



Thanks,
k
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux