On Tue, Apr 20, 2021 at 1:29 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > On Tue, Apr 20, 2021 at 11:14:41AM +0200, Jinpu Wang wrote: > > On Mon, Mar 22, 2021 at 7:56 AM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > > > > > On Mon, Mar 22, 2021 at 07:08:01AM +0100, Jinpu Wang wrote: > > > > On Sun, Mar 21, 2021 at 2:07 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > > > > > > > > > On Sat, Mar 20, 2021 at 02:09:50PM +0100, Jack Wang wrote: > > > > > > Leon Romanovsky <leon@xxxxxxxxxx>于2021年3月20日 周六12:17写道: > > > > > > > > > > > > > On Fri, Mar 19, 2021 at 08:44:29AM +0100, Jinpu Wang wrote: > > > > > > > > Hi Jason and Leon, > > > > > > > > > > > > > > > > We recently switch to use upstream OFED from MLNX-OFED, and we notice > > > > > > > > IPoIB stop working with upstream kernel 5.4.102 with mellanox CX-5 > > > > > > > > HCA, it's working fine on CX-2/CX-3. I tested also on 5.11 kernel it > > > > > > > > behaves the same. > > > > > > > > > > > > > > Are you using "enhanced IPoIB" for CX-5 devices? MLX5_CORE_IPOIB? > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > Yes. > > > > > > > > > > > Is this expected behavor? > > > > > > > > > > Yes, we wanted to make IPoIB behave like any other netdev interfaces and > > > > > if parent interface isn't enabled, no traffic should pass. More on that, > > > > > in our internal implementation of enhanced IPoIB, we are reusing same > > > > > resources for both parent and child, this requires us to wait for "UP" > > > > > event before allowing traffic. > > > > > > > > > > Thanks > > > > Hi Leon, > > > > > > > > Thanks for the clarification, is this behavior documented somewhere? > > > > is it specific to "enhanced IPoIB" for CX-5? > > > > > > It is specific to "enhanced IPoIB" and not to device. I don't know where > > > we can document it. > > > > > > > Will it work differently if without MLX5_CORE_IPOIB enabled? > > > > > > Yes, without MLX5_CORE_IPOIB, the devices will work in "legacy IPoIB", > > > exactly as cx-3. The best thing will be to change IPoIB ULP to behave > > > like netdev, but we were not comfortable to do it back then due to > > > user visible nature of such change. > > > > > Hi Leon, > > > > More testing reveals new problems with MLX5_CORE_IPOIB. > > w MLX5_CORE_IPOIB, ping wors on both hosts, but iperf3 doens't send any data. Just want to give an update, we finally find out the key which leads to the failure on our side. we need to set the child interface to same MTU as the parent. jwang@xxxxxxxxxxxxxx:/mnt/jwang$ ip link list 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:ff:07:ce brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:ff:07:cf brd ff:ff:ff:ff:ff:ff 6: ha_transport: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether f6:ff:16:93:08:8a brd ff:ff:ff:ff:ff:ff 11: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP mode DEFAULT group default qlen 1024 link/infiniband 00:00:00:83:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:12 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff 12: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP mode DEFAULT group default qlen 1024 link/infiniband 00:00:01:58:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:13 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff 13: ib0.dddd@ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP mode DEFAULT group default qlen 1024 link/infiniband 00:00:10:8c:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:12 brd 00:ff:ff:ff:ff:12:40:1b:dd:dd:00:00:00:00:00:00:ff:ff:ff:ff 14: ib1.dddd@ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 1024 link/infiniband 00:00:11:8c:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:13 brd 00:ff:ff:ff:ff:12:40:1b:dd:dd:00:00:00:00:00:00:ff:ff:ff:ff Initially, ib0 mtu is 2044, and ib0.dddd is 4092. After I reduced ib0.dddd mtu to 2044 on both sides, then iperf3 works fine. Could you explain why mtu must be set to exactly the same in case of enhanced IPoIB mode? is there anything else we must treat it special? I guess it related to > > > > > in our internal implementation of enhanced IPoIB, we are reusing same > > > > > resources for both parent and child, this requires us to wait for "UP" > > > > > event before allowing traffic. Thanks! Jinpu