Why do each node have different views on the nodes that rejoin the network in a fully mesh RDMACM configuration?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have four nodes: A, B, C, and D. They use RDMACM for full
connectivity, which means they are both servers and clients to each
other. When the process on node C is stopped out and restarted after
few minutes, the other three nodes act as clients and initiate an
active connection to node C. However, only node D successfully
connects, while for nodes A and B, connection failure occurs on node C
due to receiving the RDMA_CM_EVENT_REJECTED event. The status value of
the event is 10 (according to IBTA, it means a stale connection). It
seems that each node has different opinions on the rejoining of the
rejoined C node.

Even more strangely, just after node D successfully connected to node
C, the connection between node A and node D(D as server), and the
connection between node B and node D(D as server too)  are almost
simultaneously disconnected, because they received the
RDMA_CM_EVENT_DISCONNECTED event from each other.

Could you please help me check what the problem is? Thank you!

-- 
B.R.,
Zhijiang




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux