Re: Fail to establish RoCE connectivity after restarting network service

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 13 Jun 2022, Meng Wang wrote:

> We found a RoCE connectivity problem in such environment:
> * two servers with ConnectX4 NICs (model: MCX414A-BCAT)

Thats pretty early ROCE hardware.

> * servers are connected to one SX6036 switch where flow control is enabled

An Ethernet to IB gateway. And we are doing ROCE???

So you are using the Ethernet ports on the SX6036? These tests are not
Infiniband but purely Ethernet?

> rping is used to test RoCE links connectivity between servers. At
> initial, they can establish RoCE connections (rping to each other
> works). However, after we did ifdown/ifup the interfaces, restart
> network services, or rebooted the two servers, the connectivity between
> the two servers may become abnormal: i.e. sometimes the active side was
> stuck at "rdma_connect" without any CM event generated later; sometimes,
> the connection can be established, but the sender side failed to send
> message to the receiver (with error 12: IBV_WC_RETRY_EXC_ERR). If we
> repeat ifdown/up the affected interface or restart the network service
> for several rounds, the connectivity between the two servers can
> eventually become normal. We repeated this test on various linux
> distributions, OFED drivers and kernel versions as listed above, and
> found that this problem can be reproduced on all these setups. TCP/IP
> connections are always working as expected. We are not sure whether it
> is a bug or a configuration problem. Is there any method to troubleshoot
> this problem? Any suggestion is appreciated.

The rping connections are using the RC logic with special MAD packet
handshakes via QP1 (even under ROCE). IP is based on UD packets and thus does
not use the MAD handshares to establish RC based connections. This points
to an issue in the ROCE MAD handshake logic to establish an RC connection
in the kernel.

The MAD packets on ROCE are encapsulated in UDP.





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux