I have Ubuntu 16.04.05 and ceph 13.2.1.
On fresh system install mon and mgr only for one node mtn7. Work ok.
systemctl stop ceph-mon@mtn7
systemctl stop ceph-mgr@mtn7
Next, I changed the settings for RDMA
for /lib/systemd/system/ceph-mon@.service
remove
PrivateDevice=yes
add
PrivateDevices=no
LimitMEMLOCK=infinity
systemctl daemon-reload
systemctl restart ceph-mon@mtn7
ceph-mon work correctly!
Nex step without systemd!:
VARIANT 1 (WORK):
# su -l ceph -s /bin/bash
$ id
uid=64045(ceph) gid=64045(ceph) groups=64045(ceph)
$ /usr/bin/ceph-mgr -f --cluster ceph --id mtn7
tcmalloc: large alloc 1074012160 bytes == 0x55c8d9714000 @ 0x7f60b82afc7b 0x7f60b8ea5029 0x7f60b8ea7a9b 0x7f60b8ea6e78 0x7f60b8ea6f69 0x7f60b8ea753c 0x7f60b8ead18a 0x7f60b8e84d23 0x7f60b8e89108 0x7f60b8e9a077 0x7f60b8e9c9dc 0x7f60b91d29df 0x7f60b80676ba 0x7f60b789041d (nil)
Work ALL!!!
VARIANT 2 (NOT WORK):
id
uid=0(root) gid=0(root) groups=0(root)
# /usr/bin/ceph-mgr -f --cluster ceph --id mtn7 --setuser ceph --setgroup ceph
tcmalloc: large alloc 1074012160 bytes == 0x561dfc18c000 @ 0x7f23dfcedc7b 0x7f23e08e3029 0x7f23e08e5a9b 0x7f23e08e4e78 0x7f23e08e4f69 0x7f23e08e553c 0x7f23e08eb18a 0x7f23e08c2d23 0x7f23e08c7108 0x7f23e08d8077 0x7f23e08da9dc 0x7f23e0c109df 0x7f23dfaa56ba 0x7f23df2ce41d (nil)
.....
*** Caught signal (Aborted) **
in thread 7f23d7353700 thread_name:rdma-polling
2018-08-02 15:41:42.762 7f23d7353700 -1 RDMAStack polling poll failed -4
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
1: (()+0x255920) [0x561df23c6920]
2: (()+0x11390) [0x7f23dfaaf390]
3: (gsignal()+0x38) [0x7f23df1fc428]
4: (abort()+0x16a) [0x7f23df1fe02a]
5: (RDMADispatcher::polling()+0x1084) [0x7f23e08ef8a4]
6: (()+0x7569df) [0x7f23e0c109df]
7: (()+0x76ba) [0x7f23dfaa56ba]
8: (clone()+0x6d) [0x7f23df2ce41d]
2018-08-02 15:41:42.762 7f23d7353700 -1 *** Caught signal (Aborted) **
in thread 7f23d7353700 thread_name:rdma-polling
Aborted (core dumped)
WTF?
For test (start only) work next configuration:
/lib/systemd/system/ceph-mgr@.service
---
ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id %i
LimitMEMLOCK=infinity
User=ceph
Group=ceph
---
MY CONFIGURATION:
cat /etc/ceph/ceph.conf
[global]
fsid = 926768bc-add7-40d2-b7a7-1e6f9467ba3b
mon_initial_members = mtn7
mon_host = 10.63.3.77
public_network = 10.63.3.0/24
cluster network = 10.63.3.0/24
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
ms_type = async+rdma
ms_cluster_type = async+rdma
ms_async_rdma_device_name = mlx4_0
ms_async_rdma_polling_us = 0
ms_async_rdma_local_gid = fe80:0000:0000:0000:0cc4:7aff:ff37:cb5d
ms_async_rdma_port_num = 1
debug ms = 20/20
LOG:
*** Caught signal (Aborted) **
in thread 7fe1a2d43700 thread_name:rdma-polling
2018-08-02 15:45:34.431 7fe1a2d43700 -1 RDMAStack polling poll failed -4
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
1: (()+0x255920) [0x557d016c5920]
2: (()+0x11390) [0x7fe1ab49f390]
3: (gsignal()+0x38) [0x7fe1aabec428]
4: (abort()+0x16a) [0x7fe1aabee02a]
5: (RDMADispatcher::polling()+0x1084) [0x7fe1ac2df8a4]
6: (()+0x7569df) [0x7fe1ac6009df]
7: (()+0x76ba) [0x7fe1ab4956ba]
8: (clone()+0x6d) [0x7fe1aacbe41d]
2018-08-02 15:45:34.431 7fe1a2d43700 -1 *** Caught signal (Aborted) **
in thread 7fe1a2d43700 thread_name:rdma-polling
.....
-300> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl send QP: 2809
-299> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl submit we need 8 bytes. iov size: 1
-298> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl submit left bytes: 0 in buffers 0 tx chunks 1
-297> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl post_work_request QP: 2809 0x557d0c401fe0
-296> 2018-08-02 15:45:34.423 7fe1a2d43700 20 RDMAStack polling got tx cq event.
-295> 2018-08-02 15:45:34.423 7fe1a2d43700 20 RDMAStack polling tx completion queue got 1 responses.
-294> 2018-08-02 15:45:34.423 7fe1a2d43700 20 Infiniband rearm_notify started.
-293> 2018-08-02 15:45:34.423 7fe1a2d43700 20 Infiniband rearm_notify started.
-292> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl post_work_request qp state is IBV_QPS_RTS
-291> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl submit finished sending 8 bytes.
-290> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._try_send sent bytes 8 remaining bytes 0
-289> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_CONNECTING_READY pgs=0 cs=0 l=0)._process_connection send in_seq done
-288> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_CONNECTING_READY pgs=0 cs=0 l=0).process prev state is STATE_CONNECTING_WAIT_ACK_SEQ
-287> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1)._process_connection connect success 1, lossy = 1, features 4611087854031142907
-286> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_CONNECTING_READY
-285> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 0 in 2809 r = -1
-284> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).handle_write
-283> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1)._append_keepalive_or_ack
-282> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).prepare_send_message m auth(proto 0 29 bytes epoch 0) v1
-281> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).prepare_send_message encoding features 4611087854031142907 0x557d04336780 auth(proto 0 29 bytes epoch 0) v1
-280> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message no session security
-279> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending message type=17 src client.? front=59 data=0 off 0
-278> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending 1 0x557d04336780
-277> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl send QP: 2809
-276> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl submit we need 143 bytes. iov size: 1
-275> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl submit left bytes: 0 in buffers 0 tx chunks 1
-274> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl post_work_request QP: 2809 0x557d0c401fe0
-273> 2018-08-02 15:45:34.423 7fe1a2d43700 20 RDMAStack polling got tx cq event.
-272> 2018-08-02 15:45:34.423 7fe1a2d43700 20 RDMAStack polling tx completion queue got 1 responses.
-271> 2018-08-02 15:45:34.423 7fe1a2d43700 20 Infiniband rearm_notify started.
-270> 2018-08-02 15:45:34.423 7fe1a2d43700 20 Infiniband rearm_notify started.
-269> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl post_work_request qp state is IBV_QPS_RTS
-268> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl submit finished sending 143 bytes.
-267> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1)._try_send sent bytes 143 remaining bytes 0
-266> 2018-08-02 15:45:34.423 7fe1a655d700 20 EpollDriver.del_event del event fd=25 cur_mask=3 delmask=2 to 7
-265> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending 0x557d04336780 done.
-264> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).handle_write
-263> 2018-08-02 15:45:34.423 7fe1a2d43700 20 RDMAStack polling got rx cq event.
-262> 2018-08-02 15:45:34.423 7fe1a2d43700 20 RDMAStack polling rx completion queue got 1 responses.
-261> 2018-08-02 15:45:34.423 7fe1a2d43700 20 Infiniband rearm_notify started.
-260> 2018-08-02 15:45:34.423 7fe1a2d43700 20 Infiniband rearm_notify started.
-259> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-258> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 2809 r = 8
-257> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl read poll queue got 1 responses. QP: 2809
-256> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_KEEPALIVE2_ACK pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-255> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_KEEPALIVE2_ACK pgs=21 cs=1 l=1).process got KEEPALIVE_ACK
-254> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN_KEEPALIVE2_ACK
-253> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 0 in 2809 r = -1
-252> 2018-08-02 15:45:34.423 7fe1a2d43700 20 RDMAStack polling got rx cq event.
-251> 2018-08-02 15:45:34.423 7fe1a2d43700 20 RDMAStack polling rx completion queue got 1 responses.
-250> 2018-08-02 15:45:34.423 7fe1a2d43700 20 Infiniband rearm_notify started.
-249> 2018-08-02 15:45:34.423 7fe1a2d43700 20 Infiniband rearm_notify started.
-248> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-247> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 2809 r = 8
-246> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl read poll queue got 1 responses. QP: 2809
-245> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-244> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process begin MSG
-243> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process got MSG header
-242> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process got envelope type=4 src mon.0 front=184 data=0 off 0
-241> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_MESSAGE pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_HEADER
-240> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_BYTES pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_MESSAGE
-239> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_BYTES
-238> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE
-237> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=21 cs=1 l=1).process got front 184
-236> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process aborted = 0
-235> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process got 184 + 0 + 0 byte message
-234> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process no session security set
-233> 2018-08-02 15:45:34.423 7fe1a655d700 5 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1). rx mon.0 seq 1 0x557d04169440 mon_map magic: 0 v1
-232> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 queue 0x557d04169440 prio 196
-231> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_READ_FRONT
-230> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 0 in 2809 r = -1
-229> 2018-08-02 15:45:34.423 7fe1a555b700 1 -- 10.63.3.77:0/3249334359 <== mon.0 10.63.3.77:6789/0 1 ==== mon_map magic: 0 v1 ==== 184+0+0 (3227989268 0 0) 0x557d04169440 con 0x557d043d0700
-228> 2018-08-02 15:45:34.423 7fe1a555b700 10 monclient(hunting): handle_monmap mon_map magic: 0 v1
-227> 2018-08-02 15:45:34.423 7fe1a555b700 10 monclient(hunting): got monmap 1, mon.noname-a is now rank -1
-226> 2018-08-02 15:45:34.423 7fe1a2d43700 20 RDMAStack polling got rx cq event.
-225> 2018-08-02 15:45:34.423 7fe1a555b700 10 monclient(hunting): dump:
epoch 1
fsid 926768bc-add7-40d2-b7a7-1e6f9467ba3b
last_changed 2018-08-02 09:41:20.154808
created 2018-08-02 09:41:20.154808
0: 10.63.3.77:6789/0 mon.mtn7
-224> 2018-08-02 15:45:34.423 7fe1a2d43700 20 RDMAStack polling rx completion queue got 1 responses.
-223> 2018-08-02 15:45:34.423 7fe1a555b700 10 -- 10.63.3.77:0/3249334359 dispatch_throttle_release 184 to dispatch throttler 184/104857600
-222> 2018-08-02 15:45:34.423 7fe1a2d43700 20 Infiniband rearm_notify started.
-221> 2018-08-02 15:45:34.423 7fe1a2d43700 20 Infiniband rearm_notify started.
-220> 2018-08-02 15:45:34.423 7fe1a555b700 20 -- 10.63.3.77:0/3249334359 done calling dispatch on 0x557d04169440
-219> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-218> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 2809 r = 8
-217> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl read poll queue got 1 responses. QP: 2809
-216> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-215> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process begin MSG
-214> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process got MSG header
-213> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process got envelope type=18 src mon.0 front=33 data=0 off 0
-212> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_MESSAGE pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_HEADER
-211> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_BYTES pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_MESSAGE
-210> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_BYTES
-209> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE
-208> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=21 cs=1 l=1).process got front 33
-207> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process aborted = 0
-206> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process got 33 + 0 + 0 byte message
-205> 2018-08-02 15:45:34.423 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process no session security set
-204> 2018-08-02 15:45:34.423 7fe1a655d700 5 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1). rx mon.0 seq 2 0x557d04336780 auth_reply(proto 2 0 (0) Success) v1
-203> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 queue 0x557d04336780 prio 196
-202> 2018-08-02 15:45:34.423 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_READ_FRONT
-201> 2018-08-02 15:45:34.423 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 0 in 2809 r = -1
-200> 2018-08-02 15:45:34.423 7fe1a555b700 1 -- 10.63.3.77:0/3249334359 <== mon.0 10.63.3.77:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (921552047 0 0) 0x557d04336780 con 0x557d043d0700
-199> 2018-08-02 15:45:34.427 7fe1a555b700 10 monclient(hunting): my global_id is 54114
-198> 2018-08-02 15:45:34.427 7fe1a555b700 1 -- 10.63.3.77:0/3249334359 --> 10.63.3.77:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x557d04336f00 con 0
-197> 2018-08-02 15:45:34.427 7fe1a555b700 15 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).send_message inline write is denied, reschedule m=0x557d04336f00
-196> 2018-08-02 15:45:34.427 7fe1a555b700 20 Event(0x557d041cae40 nevent=5000 time_id=2).wakeup
-195> 2018-08-02 15:45:34.427 7fe1a555b700 10 -- 10.63.3.77:0/3249334359 dispatch_throttle_release 33 to dispatch throttler 33/104857600
-194> 2018-08-02 15:45:34.427 7fe1a555b700 20 -- 10.63.3.77:0/3249334359 done calling dispatch on 0x557d04336780
-193> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).handle_write
-192> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).prepare_send_message m auth(proto 2 32 bytes epoch 0) v1
-191> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).prepare_send_message encoding features 4611087854031142907 0x557d04336f00 auth(proto 2 32 bytes epoch 0) v1
-190> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message no session security
-189> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending message type=17 src client.? front=62 data=0 off 0
-188> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending 2 0x557d04336f00
-187> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl send QP: 2809
-186> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl submit we need 137 bytes. iov size: 1
-185> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl submit left bytes: 0 in buffers 0 tx chunks 1
-184> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl post_work_request QP: 2809 0x557d0c401fe0
-183> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling got tx cq event.
-182> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling tx completion queue got 1 responses.
-181> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-180> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-179> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl post_work_request qp state is IBV_QPS_RTS
-178> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl submit finished sending 137 bytes.
-177> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1)._try_send sent bytes 137 remaining bytes 0
-176> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending 0x557d04336f00 done.
-175> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling got rx cq event.
-174> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling rx completion queue got 1 responses.
-173> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-172> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-171> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-170> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 2809 r = 8
-169> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl read poll queue got 1 responses. QP: 2809
-168> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-167> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process begin MSG
-166> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process got MSG header
-165> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process got envelope type=18 src mon.0 front=206 data=0 off 0
-164> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_MESSAGE pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_HEADER
-163> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_BYTES pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_MESSAGE
-162> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_BYTES
-161> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE
-160> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=21 cs=1 l=1).process got front 206
-159> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process aborted = 0
-158> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process got 206 + 0 + 0 byte message
-157> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process no session security set
-156> 2018-08-02 15:45:34.427 7fe1a655d700 5 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1). rx mon.0 seq 3 0x557d04336f00 auth_reply(proto 2 0 (0) Success) v1
-155> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 queue 0x557d04336f00 prio 196
-154> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_READ_FRONT
-153> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 0 in 2809 r = -1
-152> 2018-08-02 15:45:34.427 7fe1a555b700 1 -- 10.63.3.77:0/3249334359 <== mon.0 10.63.3.77:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (1306133030 0 0) 0x557d04336f00 con 0x557d043d0700
-151> 2018-08-02 15:45:34.427 7fe1a555b700 1 monclient: found mon.mtn7
-150> 2018-08-02 15:45:34.427 7fe1a555b700 10 monclient: _send_mon_message to mon.mtn7 at 10.63.3.77:6789/0
-149> 2018-08-02 15:45:34.427 7fe1a555b700 1 -- 10.63.3.77:0/3249334359 --> 10.63.3.77:6789/0 -- mon_subscribe({config=0+,monmap=0+}) v3 -- 0x557d04168240 con 0
-148> 2018-08-02 15:45:34.427 7fe1a555b700 15 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).send_message inline write is denied, reschedule m=0x557d04168240
-147> 2018-08-02 15:45:34.427 7fe1a555b700 20 Event(0x557d041cae40 nevent=5000 time_id=2).wakeup
-146> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).handle_write
-145> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).prepare_send_message m mon_subscribe({config=0+,monmap=0+}) v3
-144> 2018-08-02 15:45:34.427 7fe1a555b700 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2018-08-02 15:45:04.429459)
-143> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).prepare_send_message encoding features 4611087854031142907 0x557d04168240 mon_subscribe({config=0+,monmap=0+}) v3
-142> 2018-08-02 15:45:34.427 7fe1a555b700 10 monclient: _send_mon_message to mon.mtn7 at 10.63.3.77:6789/0
-141> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message no session security
-140> 2018-08-02 15:45:34.427 7fe1a555b700 1 -- 10.63.3.77:0/3249334359 --> 10.63.3.77:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- 0x557d04336780 con 0
-139> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending message type=15 src client.? front=46 data=0 off 0
-138> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending 3 0x557d04168240
-137> 2018-08-02 15:45:34.427 7fe1a555b700 15 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).send_message inline write is denied, reschedule m=0x557d04336780
-136> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl send QP: 2809
-135> 2018-08-02 15:45:34.427 7fe1a555b700 20 Event(0x557d041cae40 nevent=5000 time_id=2).wakeup
-134> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl submit we need 121 bytes. iov size: 3
-133> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl submit left bytes: 0 in buffers 0 tx chunks 1
-132> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl post_work_request QP: 2809 0x557d0c401fe0
-131> 2018-08-02 15:45:34.427 7fe1a555b700 10 -- 10.63.3.77:0/3249334359 dispatch_throttle_release 206 to dispatch throttler 206/104857600
-130> 2018-08-02 15:45:34.427 7fe1a555b700 20 -- 10.63.3.77:0/3249334359 done calling dispatch on 0x557d04336f00
-129> 2018-08-02 15:45:34.427 7fe1b4e57380 5 monclient: authenticate success, global_id 54114
-128> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling got tx cq event.
-127> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling tx completion queue got 1 responses.
-126> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-125> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-124> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl post_work_request qp state is IBV_QPS_RTS
-123> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl submit finished sending 121 bytes.
-122> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1)._try_send sent bytes 121 remaining bytes 0
-121> 2018-08-02 15:45:34.427 7fe1a655d700 20 Event(0x557d041cae40 nevent=5000 time_id=2).create_file_event create event started fd=25 mask=2 original mask is 1
-120> 2018-08-02 15:45:34.427 7fe1a655d700 20 EpollDriver.add_event add event fd=25 cur_mask=1 add_mask=2 to 7
-119> 2018-08-02 15:45:34.427 7fe1a655d700 20 Event(0x557d041cae40 nevent=5000 time_id=2).create_file_event create event end fd=25 mask=2 original mask is 3
-118> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending 0x557d04168240 done.
-117> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).prepare_send_message m auth(proto 2 2 bytes epoch 0) v1
-116> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).prepare_send_message encoding features 4611087854031142907 0x557d04336780 auth(proto 2 2 bytes epoch 0) v1
-115> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message no session security
-114> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending message type=17 src client.? front=32 data=0 off 0
-113> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending 4 0x557d04336780
-112> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl send QP: 2809
-111> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl submit we need 107 bytes. iov size: 1
-110> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl submit left bytes: 0 in buffers 0 tx chunks 1
-109> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl post_work_request QP: 2809 0x557d0c401fe0
-108> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling got tx cq event.
-107> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling tx completion queue got 1 responses.
-106> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-105> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-104> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl post_work_request qp state is IBV_QPS_RTS
-103> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl submit finished sending 107 bytes.
-102> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1)._try_send sent bytes 107 remaining bytes 0
-101> 2018-08-02 15:45:34.427 7fe1a655d700 20 EpollDriver.del_event del event fd=25 cur_mask=3 delmask=2 to 7
-100> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).write_message sending 0x557d04336780 done.
-99> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).handle_write
-98> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling got rx cq event.
-97> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling rx completion queue got 1 responses.
-96> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-95> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-94> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-93> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 2809 r = 8
-92> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl read poll queue got 1 responses. QP: 2809
-91> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-90> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process begin MSG
-89> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process got MSG header
-88> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process got envelope type=62 src mon.0 front=4 data=0 off 0
-87> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_MESSAGE pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_HEADER
-86> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_BYTES pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_MESSAGE
-85> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_BYTES
-84> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE
-83> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=21 cs=1 l=1).process got front 4
-82> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process aborted = 0
-81> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process got 4 + 0 + 0 byte message
-80> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process no session security set
-79> 2018-08-02 15:45:34.427 7fe1a655d700 5 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1). rx mon.0 seq 4 0x557d04168240 config(0 keys) v1
-78> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 queue 0x557d04168240 prio 196
-77> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_READ_FRONT
-76> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 0 in 2809 r = -1
-75> 2018-08-02 15:45:34.427 7fe1a555b700 1 -- 10.63.3.77:0/3249334359 <== mon.0 10.63.3.77:6789/0 4 ==== config(0 keys) v1 ==== 4+0+0 (0 0 0) 0x557d04168240 con 0x557d043d0700
-74> 2018-08-02 15:45:34.427 7fe1a555b700 10 monclient: handle_config config(0 keys) v1
-73> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling got rx cq event.
-72> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling rx completion queue got 1 responses.
-71> 2018-08-02 15:45:34.427 7fe1a555b700 10 -- 10.63.3.77:0/3249334359 dispatch_throttle_release 4 to dispatch throttler 4/104857600
-70> 2018-08-02 15:45:34.427 7fe1a3d58700 4 set_mon_vals no callback set
-69> 2018-08-02 15:45:34.427 7fe1a555b700 20 -- 10.63.3.77:0/3249334359 done calling dispatch on 0x557d04168240
-68> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-67> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-66> 2018-08-02 15:45:34.427 7fe1b4e57380 10 monclient: get_monmap_and_config success
-65> 2018-08-02 15:45:34.427 7fe1b4e57380 10 monclient: shutdown
-64> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-63> 2018-08-02 15:45:34.427 7fe1b4e57380 1 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).mark_down
-62> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 1 in 2809 r = 8
-61> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl read poll queue got 1 responses. QP: 2809
-60> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process prev state is STATE_OPEN
-59> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process begin MSG
-58> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process got MSG header
-57> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_HEADER pgs=21 cs=1 l=1).process got envelope type=4 src mon.0 front=184 data=0 off 0
-56> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_MESSAGE pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_HEADER
-55> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_BYTES pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_MESSAGE
-54> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_BYTES
-53> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_THROTTLE_DISPATCH_QUEUE
-52> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FRONT pgs=21 cs=1 l=1).process got front 184
-51> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process aborted = 0
-50> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process got 184 + 0 + 0 byte message
-49> 2018-08-02 15:45:34.427 7fe1a655d700 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1).process no session security set
-48> 2018-08-02 15:45:34.427 7fe1a655d700 5 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=21 cs=1 l=1). rx mon.0 seq 5 0x557d04169680 mon_map magic: 0 v1
-47> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 queue 0x557d04169680 prio 196
-46> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).process prev state is STATE_OPEN_MESSAGE_READ_FRONT
-45> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling got rx cq event.
-44> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl read notify_fd : 0 in 2809 r = -1
-43> 2018-08-02 15:45:34.427 7fe1a555b700 1 -- 10.63.3.77:0/3249334359 <== mon.0 10.63.3.77:6789/0 5 ==== mon_map magic: 0 v1 ==== 184+0+0 (3227989268 0 0) 0x557d04169680 con 0x557d043d0700
-42> 2018-08-02 15:45:34.427 7fe1b4e57380 2 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1)._stop
-41> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling rx completion queue got 1 responses.
-40> 2018-08-02 15:45:34.427 7fe1b4e57380 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_OPEN pgs=21 cs=1 l=1).discard_out_queue started
-39> 2018-08-02 15:45:34.427 7fe1b4e57380 20 Event(0x557d041cae40 nevent=5000 time_id=2).wakeup
-38> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-37> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-36> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_CLOSED pgs=21 cs=1 l=1).process prev state is STATE_CLOSED
-35> 2018-08-02 15:45:34.427 7fe1a655d700 20 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:6789/0 conn(0x557d043d0700 :-1 s=STATE_CLOSED pgs=21 cs=1 l=1).process socket closed
-34> 2018-08-02 15:45:34.427 7fe1a555b700 10 monclient: discarding stray monitor message mon_map magic: 0 v1
-33> 2018-08-02 15:45:34.427 7fe1a655d700 20 EpollDriver.del_event del event fd=25 cur_mask=1 delmask=3 to 7
-32> 2018-08-02 15:45:34.427 7fe1a555b700 10 -- 10.63.3.77:0/3249334359 dispatch_throttle_release 184 to dispatch throttler 184/104857600
-31> 2018-08-02 15:45:34.427 7fe1a555b700 20 -- 10.63.3.77:0/3249334359 done calling dispatch on 0x557d04169680
-30> 2018-08-02 15:45:34.427 7fe1a655d700 20 RDMAConnectedSocketImpl ~RDMAConnectedSocketImpl destruct.
-29> 2018-08-02 15:45:34.427 7fe1a655d700 20 EpollDriver.del_event del event fd=26 cur_mask=1 delmask=1 to 7
-28> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling got tx cq event.
-27> 2018-08-02 15:45:34.427 7fe1a2d43700 20 RDMAStack polling tx completion queue got 1 responses.
-26> 2018-08-02 15:45:34.427 7fe1a2d43700 1 RDMAStack handle_tx_event sending of the disconnect msg completed
-25> 2018-08-02 15:45:34.427 7fe1a2d43700 10 RDMAStack polling finally delete qp=0x557d04146300
-24> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband ~QueuePair destroy qp=0x557d0413d2c0
-23> 2018-08-02 15:45:34.427 7fe1b4e57380 10 -- 10.63.3.77:0/3249334359 shutdown 10.63.3.77:0/3249334359
-22> 2018-08-02 15:45:34.427 7fe1b4e57380 10 Processor -- stop
-21> 2018-08-02 15:45:34.427 7fe1b4e57380 1 -- 10.63.3.77:0/3249334359 shutdown_connections
-20> 2018-08-02 15:45:34.427 7fe1b4e57380 5 -- 10.63.3.77:0/3249334359 shutdown_connections mark down 10.63.3.77:6789/0 0x557d043d0700
-19> 2018-08-02 15:45:34.427 7fe1b4e57380 5 -- 10.63.3.77:0/3249334359 shutdown_connections delete 0x557d043d0700
-18> 2018-08-02 15:45:34.427 7fe1b4e57380 20 Event(0x557d041cab80 nevent=5000 time_id=1).wakeup
-17> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-16> 2018-08-02 15:45:34.427 7fe1a2d43700 20 Infiniband rearm_notify started.
-15> 2018-08-02 15:45:34.427 7fe1b4e57380 20 Event(0x557d041cae40 nevent=5000 time_id=2).wakeup
-14> 2018-08-02 15:45:34.427 7fe1b4e57380 20 Event(0x557d041cb100 nevent=5000 time_id=1).wakeup
-13> 2018-08-02 15:45:34.427 7fe1b4e57380 10 -- 10.63.3.77:0/3249334359 wait: waiting for dispatch queue
-12> 2018-08-02 15:45:34.427 7fe1b4e57380 10 -- 10.63.3.77:0/3249334359 wait: dispatch queue is stopped
-11> 2018-08-02 15:45:34.427 7fe1b4e57380 1 -- 10.63.3.77:0/3249334359 shutdown_connections
-10> 2018-08-02 15:45:34.427 7fe1b4e57380 20 Event(0x557d041cab80 nevent=5000 time_id=1).wakeup
-9> 2018-08-02 15:45:34.427 7fe1b4e57380 20 Event(0x557d041cae40 nevent=5000 time_id=2).wakeup
-8> 2018-08-02 15:45:34.427 7fe1b4e57380 20 Event(0x557d041cb100 nevent=5000 time_id=1).wakeup
-7> 2018-08-02 15:45:34.427 7fe1b4e57380 10 -- 10.63.3.77:0/3249334359 wait: done.
-6> 2018-08-02 15:45:34.427 7fe1b4e57380 1 -- 10.63.3.77:0/3249334359 wait complete.
-5> 2018-08-02 15:45:34.427 7fe1b4e57380 1 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:0/3249334359 conn(0x557d043d0000 :-1 s=STATE_NONE pgs=0 cs=0 l=0).mark_down
-4> 2018-08-02 15:45:34.427 7fe1b4e57380 2 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:0/3249334359 conn(0x557d043d0000 :-1 s=STATE_NONE pgs=0 cs=0 l=0)._stop
-3> 2018-08-02 15:45:34.427 7fe1b4e57380 10 -- 10.63.3.77:0/3249334359 >> 10.63.3.77:0/3249334359 conn(0x557d043d0000 :-1 s=STATE_NONE pgs=0 cs=0 l=0).discard_out_queue started
-2> 2018-08-02 15:45:34.427 7fe1b4e57380 20 Event(0x557d041cab80 nevent=5000 time_id=1).wakeup
-1> 2018-08-02 15:45:34.431 7fe1a2d43700 -1 RDMAStack polling poll failed -4
0> 2018-08-02 15:45:34.431 7fe1a2d43700 -1 *** Caught signal (Aborted) **
in thread 7fe1a2d43700 thread_name:rdma-polling
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)
1: (()+0x255920) [0x557d016c5920]
2: (()+0x11390) [0x7fe1ab49f390]
3: (gsignal()+0x38) [0x7fe1aabec428]
4: (abort()+0x16a) [0x7fe1aabee02a]
5: (RDMADispatcher::polling()+0x1084) [0x7fe1ac2df8a4]
6: (()+0x7569df) [0x7fe1ac6009df]
7: (()+0x76ba) [0x7fe1ab4956ba]
8: (clone()+0x6d) [0x7fe1aacbe41d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Stanislav.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com