Ok, Thank you very much . I will try to caontack them and update the problem. And in the meantime , I will try to debug it by just seting up one mon and one osd. Thanks again.
On Mon, Jul 23, 2018 at 3:49 PM John Hearns <hearnsj@xxxxxxxxxxxxxx> wrote:
Will, looking at the logs which you sent, the connection cannot be set up.I did try Googling for thse error messages, and I Could nto find anything definite.As an aside QP = Queue Pair which is the structure set up to transfer information across an IB network.Think of it like a TCP connection.I think you should contact Mellanos support over this one. They are really good guys.On 23 July 2018 at 08:14, Will Zhao <zhao6305@xxxxxxxxx> wrote:Hi John:this is the information ibv_devinfo gives .hca_id: mlx4_0transport: InfiniBand (0)fw_ver: 2.35.5100node_guid: e41d:2d03:0072:ed70sys_image_guid: e41d:2d03:0072:ed73vendor_id: 0x02c9vendor_part_id: 4099hw_ver: 0x1board_id: MT_1090110019phys_port_cnt: 2Device ports:port: 1state: PORT_DOWN (1)max_mtu: 4096 (5)active_mtu: 4096 (5)sm_lid: 0port_lid: 0port_lmc: 0x00link_layer: InfiniBandport: 2state: PORT_ACTIVE (4)max_mtu: 4096 (5)active_mtu: 4096 (5)sm_lid: 2port_lid: 11port_lmc: 0x00link_layer: InfiniBand
On Fri, Jul 20, 2018 at 7:09 PM John Hearns <hearnsj@xxxxxxxxxxxxxx> wrote: What does ibv_devinfo give you? On 20 July 2018 at 12:13, Will Zhao <zhao6305@xxxxxxxxx> wrote: Now I add the option "debug ms = 20/20" to ceph.conf global section to see more details about the errors, this time "ceph -s" shows thousands of lines, here are some log I paste from the results: 2018-07-20 16:12:49.994715 7f3a3be8e700 20 Infiniband verify_prereq ms_async_rdma_enable_hugepage value is: 0 2018-07-20 16:12:49.994723 7f3a3be8e700 20 Infiniband Infiniband constructing Infiniband... 2018-07-20 16:12:49.994748 7f3a3be8e700 20 RDMAStack RDMAStack constructing RDMAStack... 2018-07-20 16:12:49.994750 7f3a3be8e700 20 RDMAStack creating RDMAStack:0x7f3a340b5448 with dispatcher:0x7f3a340b5558 2018-07-20 16:12:49.994924 7f3a3970e700 2 Event(0x7f3a340e2fe0 nevent=5000 time_id=1).set_owner idx=1 owner=139888048531200 2018-07-20 16:12:49.994990 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=1).create_file_event create event started fd=7 mask=1 original mask is 0 2018-07-20 16:12:49.994990 7f3a38f0d700 2 Event(0x7f3a34110850 nevent=5000 time_id=1).set_owner idx=2 owner=139888040138496 2018-07-20 16:12:49.994999 7f3a3970e700 20 EpollDriver.add_event add event fd=7 cur_mask=0 add_mask=1 to 6 2018-07-20 16:12:49.994991 7f3a39f0f700 2 Event(0x7f3a340b5770 nevent=5000 time_id=1).set_owner idx=0 owner=139888056923904 2018-07-20 16:12:49.995009 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=1).create_file_event create event end fd=7 mask=1 original mask is 1 2018-07-20 16:12:49.995011 7f3a38f0d700 20 Event(0x7f3a34110850 nevent=5000 time_id=1).create_file_event create event started fd=11 mask=1 original mask is 0 2018-07-20 16:12:49.995013 7f3a39f0f700 20 Event(0x7f3a340b5770 nevent=5000 time_id=1).create_file_event create event started fd=4 mask=1 original mask is 0 2018-07-20 16:12:49.995016 7f3a38f0d700 20 EpollDriver.add_event add event fd=11 cur_mask=0 add_mask=1 to 10 2018-07-20 16:12:49.995017 7f3a39f0f700 20 EpollDriver.add_event add event fd=4 cur_mask=0 add_mask=1 to 3 2018-07-20 16:12:49.995018 7f3a3970e700 10 stack operator() starting 2018-07-20 16:12:49.995022 7f3a38f0d700 20 Event(0x7f3a34110850 nevent=5000 time_id=1).create_file_event create event end fd=11 mask=1 original mask is 1 2018-07-20 16:12:49.995022 7f3a39f0f700 20 Event(0x7f3a340b5770 nevent=5000 time_id=1).create_file_event create event end fd=4 mask=1 original mask is 1 2018-07-20 16:12:49.995026 7f3a38f0d700 10 stack operator() starting 2018-07-20 16:12:49.995027 7f3a39f0f700 10 stack operator() starting 2018-07-20 16:12:49.995938 7f3a3be8e700 10 -- - ready - 2018-07-20 16:12:49.995946 7f3a3be8e700 1 Processor -- start 2018-07-20 16:12:49.995996 7f3a3be8e700 1 -- - start start 2018-07-20 16:12:49.996535 7f3a3be8e700 10 -- - create_connect 10.10.121.25:6789/0, creating connection and registering 2018-07-20 16:12:49.996574 7f3a3be8e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_NONE pgs=0 cs=0 l=1)._connect csq=0 2018-07-20 16:12:49.996594 7f3a3be8e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=1).wakeup 2018-07-20 16:12:49.996608 7f3a3be8e700 10 -- - get_connection mon.0 10.10.121.25:6789/0 new 0x7f3a34150270 2018-07-20 16:12:49.996666 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING 2018-07-20 16:12:49.996693 7f3a3be8e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).send_keepalive 2018-07-20 16:12:49.996700 7f3a3be8e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=1).wakeup 2018-07-20 16:12:49.996721 7f3a3be8e700 1 -- - --> 10.10.121.25:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- 0x7f3a3414e720 con 0 2018-07-20 16:12:49.996739 7f3a3be8e700 15 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).send_message inline write is denied, reschedule m=0x7f3a3414e720 2018-07-20 16:12:50.016836 7f3a3970e700 1 Infiniband Port using experimental verbs for gid 2018-07-20 16:12:50.017216 7f3a3970e700 1 Infiniband Port looking for local GID of type 1 2018-07-20 16:12:50.017224 7f3a3970e700 1 Infiniband Port malformed or no GID supplied, using GID index 0 2018-07-20 16:12:50.017293 7f3a3970e700 10 Infiniband binding_port port 1 is not what we want. state: 1) 2018-07-20 16:12:50.017300 7f3a3970e700 1 Infiniband Port using experimental verbs for gid 2018-07-20 16:12:50.017611 7f3a3970e700 1 Infiniband Port looking for local GID of type 1 2018-07-20 16:12:50.017617 7f3a3970e700 1 Infiniband Port malformed or no GID supplied, using GID index 0 2018-07-20 16:12:50.017664 7f3a3970e700 1 Infiniband binding_port found active port 2 2018-07-20 16:12:50.017680 7f3a3970e700 1 Infiniband init receive queue length is 4096 receive buffers 2018-07-20 16:12:50.017683 7f3a3970e700 1 Infiniband init assigning: 1024 send buffers 2018-07-20 16:12:50.017687 7f3a3970e700 1 Infiniband init device allow 4194303 completion entries 2018-07-20 16:12:50.332936 7f3a3970e700 20 Infiniband init started. 2018-07-20 16:12:50.332966 7f3a3970e700 20 Infiniband init started. 2018-07-20 16:12:50.334175 7f3a3970e700 20 Infiniband init successfully create cq=0x7f3a30008650 2018-07-20 16:12:50.335282 7f3a3970e700 20 Infiniband init successfully create cq=0x7f3a30008a00 2018-07-20 16:12:50.335327 7f3a3970e700 20 Infiniband init started. 2018-07-20 16:12:50.335467 7f3a255d5700 20 RDMAStack polling going to poll tx cq: 0x7f3a30008620 rx cq: 0x7f3a300089d0 2018-07-20 16:12:50.335496 7f3a255d5700 20 Infiniband rearm_notify started. 2018-07-20 16:12:50.335501 7f3a255d5700 20 Infiniband rearm_notify started. 2018-07-20 16:12:50.335502 7f3a3970e700 20 Infiniband init successfully create queue pair: qp=0x7f3a300092a0 2018-07-20 16:12:50.336033 7f3a3970e700 20 Infiniband init successfully change queue pair to INIT: qp=0x7f3a300092a0 2018-07-20 16:12:50.336049 7f3a3970e700 20 RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0 2018-07-20 16:12:50.336172 7f3a3970e700 20 RDMAConnectedSocketImpl try_connect tcp_fd: 18 2018-07-20 16:12:50.336186 7f3a3970e700 10 Infiniband send_msg sending: 11, 276404, 0, 0, fe80000000000000e41d2d030072ed72 2018-07-20 16:12:50.336212 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=1).create_file_event create event started fd=18 mask=1 original mask is 0 2018-07-20 16:12:50.336227 7f3a3970e700 20 EpollDriver.add_event add event fd=18 cur_mask=0 add_mask=1 to 6 2018-07-20 16:12:50.336234 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=1).create_file_event create event end fd=18 mask=1 original mask is 1 2018-07-20 16:12:50.336240 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=1).create_file_event create event started fd=17 mask=1 original mask is 0 2018-07-20 16:12:50.336243 7f3a3970e700 20 EpollDriver.add_event add event fd=17 cur_mask=0 add_mask=1 to 6 2018-07-20 16:12:50.336246 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=1).create_file_event create event end fd=17 mask=1 original mask is 1 2018-07-20 16:12:50.336249 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING 2018-07-20 16:12:50.336272 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._process_connection nonblock connect inprogress 2018-07-20 16:12:50.336306 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).handle_write 2018-07-20 16:12:50.336314 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).handle_write 2018-07-20 16:12:50.337500 7f3a3970e700 20 RDMAConnectedSocketImpl handle_connection QP: 276404 tcp_fd: 18 notify_fd: 17 2018-07-20 16:12:50.337524 7f3a3970e700 5 Infiniband recv_msg recevd: 0, 0, 0, 0, ?? 2018-07-20 16:12:50.337528 7f3a3970e700 20 RDMAConnectedSocketImpl handle_connection peer msg : < 0, 0, 0, 0> 2018-07-20 16:12:50.337533 7f3a3970e700 20 RDMAConnectedSocketImpl activate Choosing gid_index 0, sl 3 2018-07-20 16:12:50.337597 7f3a3970e700 20 RDMAConnectedSocketImpl activate transition to RTR state successfully. 2018-07-20 16:12:50.337637 7f3a3970e700 20 RDMAConnectedSocketImpl activate transition to RTS state successfully. 2018-07-20 16:12:50.337643 7f3a3970e700 20 RDMAConnectedSocketImpl activate QueuePair: 0x7f3a30009140 with qp:0x7f3a300092a0 2018-07-20 16:12:50.337645 7f3a3970e700 20 RDMAConnectedSocketImpl activate handle fake send, wake it up. QP: 276404 2018-07-20 16:12:50.337649 7f3a3970e700 20 RDMAConnectedSocketImpl submit we need 0 bytes. iov size: 0 2018-07-20 16:12:50.337655 7f3a3970e700 10 Infiniband send_msg sending: 11, 276404, 0, 0, fe80000000000000e41d2d030072ed72 2018-07-20 16:12:50.337670 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING_RE 2018-07-20 16:12:50.337678 7f3a3970e700 20 EpollDriver.del_event del event fd=17 cur_mask=1 delmask=2 to 6 2018-07-20 16:12:50.337680 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._process_connection connect successfully, ready to send banner 2018-07-20 16:12:50.337695 7f3a3970e700 20 RDMAConnectedSocketImpl send QP: 276404 2018-07-20 16:12:50.337699 7f3a3970e700 20 RDMAConnectedSocketImpl submit we need 9 bytes. iov size: 1 2018-07-20 16:12:50.337704 7f3a3970e700 20 RDMAConnectedSocketImpl submit left bytes: 0 in buffers 0 tx chunks 1 2018-07-20 16:12:50.337706 7f3a3970e700 20 RDMAConnectedSocketImpl post_work_request QP: 276404 0x7f3a30011890 2018-07-20 16:12:50.337713 7f3a3970e700 20 RDMAConnectedSocketImpl post_work_request qp state is IBV_QPS_RTS 2018-07-20 16:12:50.337793 7f3a3970e700 20 RDMAConnectedSocketImpl submit finished sending 9 bytes. 2018-07-20 16:12:50.337796 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._try_send sent bytes 9 remaining bytes 0 2018-07-20 16:12:50.337802 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=1).create_file_event create event started fd=17 mask=2 original mask is 1 2018-07-20 16:12:50.337806 7f3a3970e700 20 EpollDriver.add_event add event fd=17 cur_mask=1 add_mask=2 to 6 2018-07-20 16:12:50.337807 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=1).create_file_event create event end fd=17 mask=2 original mask is 3 2018-07-20 16:12:50.337810 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect write banner done: 10.10.121.25:6789/0 2018-07-20 16:12:50.337816 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING_RE 2018-07-20 16:12:50.337824 7f3a3970e700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 276404 r = 8 2018-07-20 16:12:50.337828 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).handle_write 2018-07-20 16:12:50.337831 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._try_send sent bytes 0 remaining bytes 0 2018-07-20 16:12:54.409747 7f3a255d5700 20 RDMAStack polling got tx cq event. 2018-07-20 16:12:54.409773 7f3a255d5700 20 RDMAStack polling tx completion queue got 1 responses. 2018-07-20 16:12:54.409780 7f3a255d5700 1 RDMAStack handle_tx_event connection between server and client not working. Disconnect this now 2018-07-20 16:12:54.409787 7f3a255d5700 1 RDMAConnectedSocketImpl fault tcp fd 18 2018-07-20 16:12:54.409814 7f3a255d5700 20 Infiniband rearm_notify started. 2018-07-20 16:12:54.409816 7f3a255d5700 20 Infiniband rearm_notify started. 2018-07-20 16:12:54.409841 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY 2018-07-20 16:12:54.409876 7f3a3970e700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 276404 r = 8 2018-07-20 16:12:54.409932 7f3a3970e700 1 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).read_bulk reading from fd=17 : Unknown error -104 2018-07-20 16:12:54.409953 7f3a3970e700 1 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).read_until read failed 2018-07-20 16:12:54.409969 7f3a3970e700 1 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection read banner and identify addresses failed 2018-07-20 16:12:54.409984 7f3a3970e700 20 EpollDriver.del_event del event fd=17 cur_mask=3 delmask=3 to 6 2018-07-20 16:12:54.409995 7f3a3970e700 20 RDMAConnectedSocketImpl ~RDMAConnectedSocketImpl destruct. 2018-07-20 16:12:54.410025 7f3a3970e700 20 EpollDriver.del_event del event fd=18 cur_mask=1 delmask=1 to 6 2018-07-20 16:12:54.410090 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).fault waiting 0.200000 2018-07-20 16:12:54.610395 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING 2018-07-20 16:12:54.610432 7f3a3970e700 20 Infiniband init started. 2018-07-20 16:12:54.610588 7f3a3970e700 20 Infiniband init successfully create queue pair: qp=0x7f3a3002f900 2018-07-20 16:12:54.611676 7f3a3970e700 20 Infiniband init successfully change queue pair to INIT: qp=0x7f3a3002f900 2018-07-20 16:12:54.611698 7f3a3970e700 20 RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0 2018-07-20 16:12:54.611785 7f3a3970e700 20 RDMAConnectedSocketImpl try_connect tcp_fd: 18 2018-07-20 16:12:54.611813 7f3a3970e700 10 Infiniband send_msg sending: 11, 276420, 2116118, 0, fe80000000000000e41d2d030072ed72 2018-07-20 16:12:54.611858 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=2).create_file_event create event started fd=18 mask=1 original mask is 0 2018-07-20 16:12:54.611866 7f3a3970e700 20 EpollDriver.add_event add event fd=18 cur_mask=0 add_mask=1 to 6 2018-07-20 16:12:54.611872 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=2).create_file_event create event end fd=18 mask=1 original mask is 1 2018-07-20 16:12:54.611875 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=2).create_file_event create event started fd=17 mask=1 original mask is 0 2018-07-20 16:12:54.611877 7f3a3970e700 20 EpollDriver.add_event add event fd=17 cur_mask=0 add_mask=1 to 6 2018-07-20 16:12:54.611880 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=2).create_file_event create event end fd=17 mask=1 original mask is 1 2018-07-20 16:12:54.611882 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING 2018-07-20 16:12:54.611894 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._process_connection nonblock connect inprogress 2018-07-20 16:12:54.612975 7f3a3970e700 20 RDMAConnectedSocketImpl handle_connection QP: 276420 tcp_fd: 18 notify_fd: 17 2018-07-20 16:12:54.613001 7f3a3970e700 5 Infiniband recv_msg recevd: 0, 0, 0, 0, ?? 2018-07-20 16:12:54.613004 7f3a3970e700 20 RDMAConnectedSocketImpl handle_connection peer msg : < 0, 0, 0, 0> 2018-07-20 16:12:54.613007 7f3a3970e700 20 RDMAConnectedSocketImpl activate Choosing gid_index 0, sl 3 2018-07-20 16:12:54.613104 7f3a3970e700 20 RDMAConnectedSocketImpl activate transition to RTR state successfully. 2018-07-20 16:12:54.613221 7f3a3970e700 20 RDMAConnectedSocketImpl activate transition to RTS state successfully. 2018-07-20 16:12:54.613243 7f3a3970e700 20 RDMAConnectedSocketImpl activate QueuePair: 0x7f3a3002e6f0 with qp:0x7f3a3002f900 2018-07-20 16:12:54.613249 7f3a3970e700 20 RDMAConnectedSocketImpl activate handle fake send, wake it up. QP: 276420 2018-07-20 16:12:54.613252 7f3a3970e700 20 RDMAConnectedSocketImpl submit we need 0 bytes. iov size: 0 2018-07-20 16:12:54.613267 7f3a3970e700 10 Infiniband send_msg sending: 11, 276420, 2116118, 0, fe80000000000000e41d2d030072ed72 2018-07-20 16:12:54.613294 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING_RE 2018-07-20 16:12:54.613309 7f3a3970e700 20 EpollDriver.del_event del event fd=17 cur_mask=1 delmask=2 to 6 2018-07-20 16:12:54.613313 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._process_connection connect successfully, ready to send banner 2018-07-20 16:12:54.613338 7f3a3970e700 20 RDMAConnectedSocketImpl send QP: 276420 2018-07-20 16:12:54.613342 7f3a3970e700 20 RDMAConnectedSocketImpl submit we need 9 bytes. iov size: 1 2018-07-20 16:12:54.613350 7f3a3970e700 20 RDMAConnectedSocketImpl submit left bytes: 0 in buffers 0 tx chunks 1 2018-07-20 16:12:54.613353 7f3a3970e700 20 RDMAConnectedSocketImpl post_work_request QP: 276420 0x7f3a30011890 2018-07-20 16:12:54.613358 7f3a3970e700 20 RDMAConnectedSocketImpl post_work_request qp state is IBV_QPS_RTS 2018-07-20 16:12:54.613451 7f3a3970e700 20 RDMAConnectedSocketImpl submit finished sending 9 bytes. 2018-07-20 16:12:54.613458 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._try_send sent bytes 9 remaining bytes 0 2018-07-20 16:12:54.613473 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=2).create_file_event create event started fd=17 mask=2 original mask is 1 2018-07-20 16:12:54.613477 7f3a3970e700 20 EpollDriver.add_event add event fd=17 cur_mask=1 add_mask=2 to 6 2018-07-20 16:12:54.613493 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=2).create_file_event create event end fd=17 mask=2 original mask is 3 2018-07-20 16:12:54.613497 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect write banner done: 10.10.121.25:6789/0 2018-07-20 16:12:54.613511 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING_RE 2018-07-20 16:12:54.613519 7f3a3970e700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 276420 r = 8 2018-07-20 16:12:54.613529 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).handle_write 2018-07-20 16:12:54.613537 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._try_send sent bytes 0 remaining bytes 0 2018-07-20 16:12:58.738443 7f3a255d5700 20 RDMAStack polling got tx cq event. 2018-07-20 16:12:58.738459 7f3a255d5700 20 RDMAStack polling tx completion queue got 1 responses. 2018-07-20 16:12:58.738464 7f3a255d5700 1 RDMAStack handle_tx_event connection between server and client not working. Disconnect this now 2018-07-20 16:12:58.738468 7f3a255d5700 1 RDMAConnectedSocketImpl fault tcp fd 18 2018-07-20 16:12:58.738477 7f3a255d5700 10 RDMAStack polling finally delete qp=0x7f3a30009140 2018-07-20 16:12:58.738481 7f3a255d5700 20 Infiniband ~QueuePair destroy qp=0x7f3a300092a0 2018-07-20 16:12:58.738528 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY 2018-07-20 16:12:58.738555 7f3a3970e700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 276420 r = 8 2018-07-20 16:12:58.738560 7f3a3970e700 1 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).read_bulk reading from fd=17 : Unknown error -104 2018-07-20 16:12:58.738608 7f3a3970e700 1 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).read_until read failed 2018-07-20 16:12:58.738614 7f3a3970e700 1 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection read banner and identify addresses failed 2018-07-20 16:12:58.738622 7f3a3970e700 20 EpollDriver.del_event del event fd=17 cur_mask=3 delmask=3 to 6 2018-07-20 16:12:58.738631 7f3a3970e700 20 RDMAConnectedSocketImpl ~RDMAConnectedSocketImpl destruct. 2018-07-20 16:12:58.738633 7f3a3970e700 20 EpollDriver.del_event del event fd=18 cur_mask=1 delmask=1 to 6 2018-07-20 16:12:58.739731 7f3a255d5700 10 RDMAStack handle_async_event event associated qp=0x7f3a3002f900 evt: last WQE reached 2018-07-20 16:12:58.739746 7f3a255d5700 1 RDMAStack handle_async_event it's not forwardly stopped by us, reenable=0x7f3a3002f690 2018-07-20 16:12:58.739749 7f3a255d5700 1 RDMAConnectedSocketImpl fault tcp fd 18 2018-07-20 16:12:58.739756 7f3a255d5700 20 Infiniband rearm_notify started. 2018-07-20 16:12:58.739757 7f3a255d5700 20 Infiniband rearm_notify started. 2018-07-20 16:12:58.739759 7f3a255d5700 10 RDMAStack polling finally delete qp=0x7f3a3002e6f0 2018-07-20 16:12:58.739762 7f3a255d5700 20 Infiniband ~QueuePair destroy qp=0x7f3a3002f900 2018-07-20 16:12:58.740726 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).fault waiting 0.400000 2018-07-20 16:12:59.141377 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING 2018-07-20 16:12:59.141410 7f3a3970e700 20 Infiniband init started. 2018-07-20 16:12:59.141546 7f3a3970e700 20 Infiniband init successfully create queue pair: qp=0x7f3a30027970 2018-07-20 16:12:59.142182 7f3a3970e700 20 Infiniband init successfully change queue pair to INIT: qp=0x7f3a30027970 2018-07-20 16:12:59.142203 7f3a3970e700 20 RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0 2018-07-20 16:12:59.142291 7f3a3970e700 20 RDMAConnectedSocketImpl try_connect tcp_fd: 18 2018-07-20 16:12:59.142307 7f3a3970e700 10 Infiniband send_msg sending: 11, 276438, 5515815, 0, fe80000000000000e41d2d030072ed72 2018-07-20 16:12:59.142331 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event started fd=18 mask=1 original mask is 0 2018-07-20 16:12:59.142350 7f3a3970e700 20 EpollDriver.add_event add event fd=18 cur_mask=0 add_mask=1 to 6 2018-07-20 16:12:59.142356 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event end fd=18 mask=1 original mask is 1 2018-07-20 16:12:59.142360 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event started fd=17 mask=1 original mask is 0 2018-07-20 16:12:59.142376 7f3a3970e700 20 EpollDriver.add_event add event fd=17 cur_mask=0 add_mask=1 to 6 2018-07-20 16:12:59.142380 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event end fd=17 mask=1 original mask is 1 2018-07-20 16:12:59.142383 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING 2018-07-20 16:12:59.142395 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._process_connection nonblock connect inprogress 2018-07-20 16:12:59.143312 7f3a3970e700 20 RDMAConnectedSocketImpl handle_connection QP: 276438 tcp_fd: 18 notify_fd: 17 2018-07-20 16:12:59.143337 7f3a3970e700 5 Infiniband recv_msg recevd: 0, 2134519808, 2315255808, 2134519808, ?? 2018-07-20 16:12:59.143341 7f3a3970e700 20 RDMAConnectedSocketImpl handle_connection peer msg : < 2134519808, 2315255808, 0, 2134519808> 2018-07-20 16:12:59.143346 7f3a3970e700 20 RDMAConnectedSocketImpl activate Choosing gid_index 0, sl 3 2018-07-20 16:12:59.143427 7f3a3970e700 20 RDMAConnectedSocketImpl activate transition to RTR state successfully. 2018-07-20 16:12:59.143495 7f3a3970e700 20 RDMAConnectedSocketImpl activate transition to RTS state successfully. 2018-07-20 16:12:59.143504 7f3a3970e700 20 RDMAConnectedSocketImpl activate QueuePair: 0x7f3a3002e6f0 with qp:0x7f3a30027970 2018-07-20 16:12:59.143507 7f3a3970e700 20 RDMAConnectedSocketImpl activate handle fake send, wake it up. QP: 276438 2018-07-20 16:12:59.143510 7f3a3970e700 20 RDMAConnectedSocketImpl submit we need 0 bytes. iov size: 0 2018-07-20 16:12:59.143529 7f3a3970e700 10 Infiniband send_msg sending: 11, 276438, 5515815, 2134519808, fe80000000000000e41d2d030072ed72 2018-07-20 16:12:59.143569 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING_RE 2018-07-20 16:12:59.143585 7f3a3970e700 20 EpollDriver.del_event del event fd=17 cur_mask=1 delmask=2 to 6 2018-07-20 16:12:59.143589 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._process_connection connect successfully, ready to send banner 2018-07-20 16:12:59.143611 7f3a3970e700 20 RDMAConnectedSocketImpl send QP: 276438 2018-07-20 16:12:59.143613 7f3a3970e700 20 RDMAConnectedSocketImpl submit we need 9 bytes. iov size: 1 2018-07-20 16:12:59.143621 7f3a3970e700 20 RDMAConnectedSocketImpl submit left bytes: 0 in buffers 0 tx chunks 1 2018-07-20 16:12:59.143623 7f3a3970e700 20 RDMAConnectedSocketImpl post_work_request QP: 276438 0x7f3a30011890 2018-07-20 16:12:59.143628 7f3a3970e700 20 RDMAConnectedSocketImpl post_work_request qp state is IBV_QPS_RTS 2018-07-20 16:12:59.143713 7f3a3970e700 20 RDMAConnectedSocketImpl submit finished sending 9 bytes. 2018-07-20 16:12:59.143721 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._try_send sent bytes 9 remaining bytes 0 2018-07-20 16:12:59.143738 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event started fd=17 mask=2 original mask is 1 2018-07-20 16:12:59.143744 7f3a3970e700 20 EpollDriver.add_event add event fd=17 cur_mask=1 add_mask=2 to 6 2018-07-20 16:12:59.143748 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event end fd=17 mask=2 original mask is 3 2018-07-20 16:12:59.143751 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect write banner done: 10.10.121.25:6789/0 2018-07-20 16:12:59.143764 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING_RE 2018-07-20 16:12:59.143772 7f3a3970e700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 276438 r = 8 2018-07-20 16:12:59.143782 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).handle_write 2018-07-20 16:12:59.143794 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._try_send sent bytes 0 remaining bytes 0 2018-07-20 16:12:59.996654 7f3a267fc700 1 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).mark_down 2018-07-20 16:12:59.996675 7f3a267fc700 2 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._stop 2018-07-20 16:12:59.996686 7f3a267fc700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).discard_out_queue started 2018-07-20 16:12:59.996697 7f3a267fc700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f3a34150270 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).discard_out_queue discard 0x7f3a3414e720 2018-07-20 16:12:59.996715 7f3a267fc700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).wakeup 2018-07-20 16:12:59.996752 7f3a267fc700 10 -- - create_connect 10.10.121.25:6789/0, creating connection and registering 2018-07-20 16:12:59.996783 7f3a267fc700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c80013d0 :-1 s=STATE_NONE pgs=0 cs=0 l=1)._connect csq=0 2018-07-20 16:12:59.996814 7f3a267fc700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).wakeup 2018-07-20 16:12:59.996782 7f3a3970e700 20 EpollDriver.del_event del event fd=17 cur_mask=3 delmask=3 to 6 2018-07-20 16:12:59.996821 7f3a267fc700 10 -- - get_connection mon.0 10.10.121.25:6789/0 new 0x7f39c80013d0 2018-07-20 16:12:59.996839 7f3a3970e700 20 RDMAConnectedSocketImpl ~RDMAConnectedSocketImpl destruct. 2018-07-20 16:12:59.996843 7f3a3970e700 20 EpollDriver.del_event del event fd=18 cur_mask=1 delmask=1 to 6 2018-07-20 16:12:59.996844 7f3a267fc700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c80013d0 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).send_keepalive 2018-07-20 16:12:59.996861 7f3a267fc700 1 -- - --> 10.10.121.25:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- 0x7f39c8006040 con 0 2018-07-20 16:12:59.996871 7f3a267fc700 15 -- - >> 10.10.121.25:6789/0 conn(0x7f39c80013d0 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).send_message inline write is denied, reschedule m=0x7f39c8006040 2018-07-20 16:12:59.996919 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f39c80013d0 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING ... 2018-07-20 16:17:33.022882 7f3a3970e700 20 Infiniband init started. 2018-07-20 16:17:33.022986 7f3a3970e700 20 Infiniband init successfully create queue pair: qp=0x7f3a30000fb0 2018-07-20 16:17:33.023642 7f3a3970e700 20 Infiniband init successfully change queue pair to INIT: qp=0x7f3a30000fb0 2018-07-20 16:17:33.023656 7f3a3970e700 20 RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0 2018-07-20 16:17:33.023732 7f3a3970e700 20 RDMAConnectedSocketImpl try_connect tcp_fd: 18 2018-07-20 16:17:33.023746 7f3a3970e700 10 Infiniband send_msg sending: 11, 277532, 7190722, 0, fe80000000000000e41d2d030072ed72 2018-07-20 16:17:33.023768 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event started fd=18 mask=1 original mask is 0 2018-07-20 16:17:33.023793 7f3a3970e700 20 EpollDriver.add_event add event fd=18 cur_mask=0 add_mask=1 to 6 2018-07-20 16:17:33.023798 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event end fd=18 mask=1 original mask is 1 2018-07-20 16:17:33.023801 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event started fd=17 mask=1 original mask is 0 2018-07-20 16:17:33.023804 7f3a3970e700 20 EpollDriver.add_event add event fd=17 cur_mask=0 add_mask=1 to 6 2018-07-20 16:17:33.023807 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event end fd=17 mask=1 original mask is 1 2018-07-20 16:17:33.023810 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING 2018-07-20 16:17:33.023820 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._process_connection nonblock connect inprogress 2018-07-20 16:17:33.023836 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).handle_write 2018-07-20 16:17:33.023842 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).handle_write 2018-07-20 16:17:33.024805 7f3a3970e700 20 RDMAConnectedSocketImpl handle_connection QP: 277532 tcp_fd: 18 notify_fd: 17 2018-07-20 16:17:33.024832 7f3a3970e700 5 Infiniband recv_msg recevd: 0, 1684090482, 1986355044, 544501349, ?? 2018-07-20 16:17:33.024835 7f3a3970e700 20 RDMAConnectedSocketImpl handle_connection peer msg : < 1684090482, 1986355044, 0, 544501349> 2018-07-20 16:17:33.024838 7f3a3970e700 20 RDMAConnectedSocketImpl activate Choosing gid_index 0, sl 3 2018-07-20 16:17:33.024932 7f3a3970e700 20 RDMAConnectedSocketImpl activate transition to RTR state successfully. 2018-07-20 16:17:33.024990 7f3a3970e700 20 RDMAConnectedSocketImpl activate transition to RTS state successfully. 2018-07-20 16:17:33.024996 7f3a3970e700 20 RDMAConnectedSocketImpl activate QueuePair: 0x7f3a3002f7a0 with qp:0x7f3a30000fb0 2018-07-20 16:17:33.025010 7f3a3970e700 20 RDMAConnectedSocketImpl activate handle fake send, wake it up. QP: 277532 2018-07-20 16:17:33.025013 7f3a3970e700 20 RDMAConnectedSocketImpl submit we need 0 bytes. iov size: 0 2018-07-20 16:17:33.025023 7f3a3970e700 10 Infiniband send_msg sending: 11, 277532, 7190722, 1684090482, fe80000000000000e41d2d030072ed72 2018-07-20 16:17:33.025050 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING_RE 2018-07-20 16:17:33.025065 7f3a3970e700 20 EpollDriver.del_event del event fd=17 cur_mask=1 delmask=2 to 6 2018-07-20 16:17:33.025071 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._process_connection connect successfully, ready to send banner 2018-07-20 16:17:33.025083 7f3a3970e700 20 RDMAConnectedSocketImpl send QP: 277532 2018-07-20 16:17:33.025085 7f3a3970e700 20 RDMAConnectedSocketImpl submit we need 9 bytes. iov size: 1 2018-07-20 16:17:33.025090 7f3a3970e700 20 RDMAConnectedSocketImpl submit left bytes: 0 in buffers 0 tx chunks 1 2018-07-20 16:17:33.025093 7f3a3970e700 20 RDMAConnectedSocketImpl post_work_request QP: 277532 0x7f3a30011850 2018-07-20 16:17:33.025097 7f3a3970e700 20 RDMAConnectedSocketImpl post_work_request qp state is IBV_QPS_RTS 2018-07-20 16:17:33.025172 7f3a3970e700 20 RDMAConnectedSocketImpl submit finished sending 9 bytes. 2018-07-20 16:17:33.025178 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_RE pgs=0 cs=0 l=1)._try_send sent bytes 9 remaining bytes 0 2018-07-20 16:17:33.025186 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event started fd=17 mask=2 original mask is 1 2018-07-20 16:17:33.025189 7f3a3970e700 20 EpollDriver.add_event add event fd=17 cur_mask=1 add_mask=2 to 6 2018-07-20 16:17:33.025193 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event end fd=17 mask=2 original mask is 3 2018-07-20 16:17:33.025195 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._process_connection connect write banner done: 10.10.121.25:6789/0 2018-07-20 16:17:33.025203 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING_RE 2018-07-20 16:17:33.025210 7f3a3970e700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 277532 r = 8 2018-07-20 16:17:33.025218 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).handle_write 2018-07-20 16:17:33.025223 7f3a3970e700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._try_send sent bytes 0 remaining bytes 0 2018-07-20 16:17:34.140706 7f3a255d5700 20 RDMAStack polling got tx cq event. 2018-07-20 16:17:34.140722 7f3a255d5700 20 RDMAStack polling tx completion queue got 2 responses. 2018-07-20 16:17:34.140725 7f3a255d5700 1 RDMAStack handle_tx_event connection between server and client not working. Disconnect this now 2018-07-20 16:17:34.140728 7f3a255d5700 1 RDMAStack handle_tx_event missing qp_num=277520 discard event 2018-07-20 16:17:34.140731 7f3a255d5700 1 RDMAStack handle_tx_event Work Request Flushed Error: this connection's qp=277520 should be down while this WR=139887890327280 still in flight. 2018-07-20 16:17:34.140733 7f3a255d5700 1 RDMAStack handle_tx_event missing qp_num=277520 discard event 2018-07-20 16:17:34.140734 7f3a255d5700 1 RDMAStack handle_tx_event sending of the disconnect msg completed 2018-07-20 16:17:34.140738 7f3a255d5700 10 RDMAStack polling finally delete qp=0x7f3a3002e6f0 2018-07-20 16:17:34.140740 7f3a255d5700 20 Infiniband ~QueuePair destroy qp=0x7f3a3003ac10 2018-07-20 16:17:34.141496 7f3a255d5700 20 Infiniband rearm_notify started. 2018-07-20 16:17:34.141508 7f3a255d5700 20 Infiniband rearm_notify started. 2018-07-20 16:17:36.022943 7f3a267fc700 1 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).mark_down 2018-07-20 16:17:36.022955 7f3a267fc700 2 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1)._stop 2018-07-20 16:17:36.022962 7f3a267fc700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).discard_out_queue started 2018-07-20 16:17:36.022967 7f3a267fc700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f39c8006a60 :-1 s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=1).discard_out_queue discard 0x7f39c80031b0 2018-07-20 16:17:36.022979 7f3a267fc700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).wakeup 2018-07-20 16:17:36.023003 7f3a267fc700 10 -- - create_connect 10.10.121.25:6789/0, creating connection and registering 2018-07-20 16:17:36.023019 7f3a267fc700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c800b750 :-1 s=STATE_NONE pgs=0 cs=0 l=1)._connect csq=0 2018-07-20 16:17:36.023026 7f3a267fc700 10 -- - get_connection mon.0 10.10.121.25:6789/0 new 0x7f39c800b750 2018-07-20 16:17:36.023041 7f3a267fc700 10 -- - >> 10.10.121.25:6789/0 conn(0x7f39c800b750 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).send_keepalive 2018-07-20 16:17:36.023047 7f3a267fc700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).wakeup 2018-07-20 16:17:36.023041 7f3a3970e700 20 EpollDriver.del_event del event fd=17 cur_mask=3 delmask=3 to 6 2018-07-20 16:17:36.023057 7f3a267fc700 1 -- - --> 10.10.121.25:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- 0x7f39c80031b0 con 0 2018-07-20 16:17:36.023060 7f3a3970e700 20 RDMAConnectedSocketImpl ~RDMAConnectedSocketImpl destruct. 2018-07-20 16:17:36.023067 7f3a3970e700 20 EpollDriver.del_event del event fd=18 cur_mask=1 delmask=1 to 6 2018-07-20 16:17:36.023068 7f3a267fc700 15 -- - >> 10.10.121.25:6789/0 conn(0x7f39c800b750 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).send_message inline write is denied, reschedule m=0x7f39c80031b0 2018-07-20 16:17:36.023127 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 conn(0x7f39c800b750 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).process prev state is STATE_CONNECTING 2018-07-20 16:17:36.023152 7f3a3970e700 20 Infiniband init started. 2018-07-20 16:17:36.023272 7f3a3970e700 20 Infiniband init successfully create queue pair: qp=0x7f3a3002d600 2018-07-20 16:17:36.023923 7f3a3970e700 20 Infiniband init successfully change queue pair to INIT: qp=0x7f3a3002d600 2018-07-20 16:17:36.023938 7f3a3970e700 20 RDMAConnectedSocketImpl try_connect nonblock:1, nodelay:1, rbuf_size: 0 2018-07-20 16:17:36.024024 7f3a3970e700 20 RDMAConnectedSocketImpl try_connect tcp_fd: 18 2018-07-20 16:17:36.024037 7f3a3970e700 10 Infiniband send_msg sending: 11, 277544, 8220122, 0, fe80000000000000e41d2d030072ed72 2018-07-20 16:17:36.024060 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 time_id=3).create_file_event create event started fd=18 mask=1 original mask is 0
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com