On Mon, Mar 22, 2021 at 7:56 AM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > On Mon, Mar 22, 2021 at 07:08:01AM +0100, Jinpu Wang wrote: > > On Sun, Mar 21, 2021 at 2:07 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > > > > > On Sat, Mar 20, 2021 at 02:09:50PM +0100, Jack Wang wrote: > > > > Leon Romanovsky <leon@xxxxxxxxxx>于2021年3月20日 周六12:17写道: > > > > > > > > > On Fri, Mar 19, 2021 at 08:44:29AM +0100, Jinpu Wang wrote: > > > > > > Hi Jason and Leon, > > > > > > > > > > > > We recently switch to use upstream OFED from MLNX-OFED, and we notice > > > > > > IPoIB stop working with upstream kernel 5.4.102 with mellanox CX-5 > > > > > > HCA, it's working fine on CX-2/CX-3. I tested also on 5.11 kernel it > > > > > > behaves the same. > > > > > > > > > > Are you using "enhanced IPoIB" for CX-5 devices? MLX5_CORE_IPOIB? > > > > > > > > > > Thanks > > > > > > > > Yes. > > > > > > > Is this expected behavor? > > > > > > Yes, we wanted to make IPoIB behave like any other netdev interfaces and > > > if parent interface isn't enabled, no traffic should pass. More on that, > > > in our internal implementation of enhanced IPoIB, we are reusing same > > > resources for both parent and child, this requires us to wait for "UP" > > > event before allowing traffic. > > > > > > Thanks > > Hi Leon, > > > > Thanks for the clarification, is this behavior documented somewhere? > > is it specific to "enhanced IPoIB" for CX-5? > > It is specific to "enhanced IPoIB" and not to device. I don't know where > we can document it. > > > Will it work differently if without MLX5_CORE_IPOIB enabled? > > Yes, without MLX5_CORE_IPOIB, the devices will work in "legacy IPoIB", > exactly as cx-3. The best thing will be to change IPoIB ULP to behave > like netdev, but we were not comfortable to do it back then due to > user visible nature of such change. > Hi Leon, More testing reveals new problems with MLX5_CORE_IPOIB. w MLX5_CORE_IPOIB, ping wors on both hosts, but iperf3 doens't send any data. I'm running on A: "iperf3 -s" and on B: "sudo iperf3 -t 30000 -c ip6_of_A" example output [ 5] local 2a02:247f:401:1:2:0:a:391 port 41288 connected to 2a02:247f:401:1:2:0:a:392 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 165 KBytes 1.35 Mbits/sec 2 3.93 KBytes [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 1 3.93 KBytes [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 1 3.93 KBytes While when I disable MLX5_CORE_IPOIB, run the same test above, iperf run without problem. [ 5] local 2a02:247f:401:1:2:0:a:391 port 51866 connected to 2a02:247f:401:1:2:0:a:392 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 293 MBytes 2.46 Gbits/sec 0 1.50 MBytes [ 5] 1.00-2.00 sec 290 MBytes 2.43 Gbits/sec 0 1.50 MBytes [ 5] 2.00-3.00 sec 289 MBytes 2.42 Gbits/sec 0 1.50 MBytes [ 5] 3.00-4.00 sec 290 MBytes 2.43 Gbits/sec 0 1.50 MBytes On both side we have: jwang@xxxxxxxxxxxxxx:/mnt/jwang$ ibstat CA 'mlx5_0' CA type: MT4119 Number of ports: 1 Firmware version: 16.27.2008 Hardware version: 0 Node GUID: 0x98039b03006c7912 System image GUID: 0x98039b03006c7912 Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 14 LMC: 0 SM lid: 19 Capability mask: 0x2651e848 Port GUID: 0x98039b03006c7912 Link layer: InfiniBand CA 'mlx5_1' CA type: MT4119 Number of ports: 1 Firmware version: 16.27.2008 Hardware version: 0 Node GUID: 0x98039b03006c7913 System image GUID: 0x98039b03006c7912 Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 15 LMC: 0 SM lid: 45 Capability mask: 0x2651e848 Port GUID: 0x98039b03006c7913 Link layer: InfiniBand The initial tests were done on 5.4.102. And I did a brief test with ~linux-5.12-rc4 with MLX5_CORE_IPOIB, iperf3 also doesn't work as same as 5.4.102. cat /etc/network/interfaces.d/infiniband auto ib0.beef iface ib0.beef inet static address 10.42.3.145 netmask 20 up sysctl -w net.ipv4.conf.ib0/beef.forwarding=1 up ethtool -K $IFACE gro off pre-up ip link set ib0 up dad-attempts 600 auto ib0.dddd iface ib0.dddd inet6 static address 2a02:247f:401:1:2:0:a:391 netmask 64 pre-up ip link set ib0 up up sysctl -w net.ipv6.conf.ib0/dddd.forwarding=1 net.ipv6.conf.ib0/dddd.proxy_ndp=1 up ip -6 route add fd57:1:0:4::/64 dev $IFACE up ethtool -K $IFACE gro off dad-attempts 600 auto ib1.beef iface ib1.beef inet static address 10.43.3.145 netmask 20 up sysctl -w net.ipv4.conf.ib1/beef.forwarding=1 up ethtool -K $IFACE gro off pre-up ip link set ib1 up dad-attempts 600 auto ib1.dddd iface ib1.dddd inet6 static address 2a02:247f:402:1:2:0:a:391 netmask 64 pre-up ip link set ib1 up up sysctl -w net.ipv6.conf.ib1/dddd.forwarding=1 net.ipv6.conf.ib1/dddd.proxy_ndp=1 up ip -6 route add fd57:2:0:4::/64 dev $IFACE up ethtool -K $IFACE gro off dad-attempts 600 Thanks!