[linux-rdma and rdma-core]: Unable to perform rdma_connect in loopbacked configuration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All

I am seeing an issue that I think is a problem with rdma_cm and wanted to report it here to see if anyone has any advice. Basically, I have two HCAs in a single server connected via a network cable. I can perform ping, iperf and other IP related applications and I see traffic flow out one NIC, over the cable and in the other NIC.. However, any rdma_cm related applications fail at the rdma_connect step. [BTW I have confirmed that things work fine in a more traditional setup using two servers.]

The Details

1. 4.12.3 stable kernel.
2. rdma-core v14.
3. Mellanox CX5 100G HCAs configured for Ethernet (RoCE) mode.
4. Intel x86_64 CPU.

Using a NAT approach discussed in [1] I can setup IPv4 addresses on both HCAs such that I avoid a local loopback (the addresses I use are a little different to the ones in that reference but the approach is identical). This allows ping, iperf and other IP based applications to work just fine. For example:

<server>
iperf –B 172.18.1.1
</server>
<client>
iperf 172.18.11.1
</client>

works great and I can use packet counters to confirm the traffic is hitting the network cable.

However, if I try:

<server>
rping –s –a 172.18.1.1 -vVd
</server>
<client>
rping –c –a 172.18.11.1 –vVd
</client>

I see the following:

<server>
created cm_id 0xceded03170
rdma_bind_addr successful
rdma_listen
</server>
<client>
created cm_id 0x138702d110
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x138702d110 (parent)
cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0x138702d110 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0x138702cb80
created channel 0x138702cba0
created cq 0x138702cbc0
created qp 0x138702faf8
rping_setup_buffers called on cb 0x13870253c0
allocated & registered buffers...
cq_thread started.
cma_event type RDMA_CM_EVENT_UNREACHABLE cma_id 0x138702d110 (parent)
cma event RDMA_CM_EVENT_UNREACHABLE, error -110
wait for CONNECTED state 4
connect error -1
</client>

I’ve tried using configfs to switch the preferred RoCE mode but that had no effect. I’d appreciate any ideas or input from anyone who might have got this working on their systems. I know there are other ways to solve this (e.g. (para)virtualization of the client) but I’d like to get this approach up and running if I can). BTW as an extra piece of input I also tried using in-kernel rdma_cm (using NVMe over Fabrics) and got a similar error message…

Cheers
 
Stephen

[1] https://serverfault.com/questions/127636/force-local-ip-traffic-to-an-external-interface


��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux