Re: IPoIB child interfaces not working with mlx5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 20, 2021 at 1:29 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
>
> On Tue, Apr 20, 2021 at 11:14:41AM +0200, Jinpu Wang wrote:
> > On Mon, Mar 22, 2021 at 7:56 AM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Mar 22, 2021 at 07:08:01AM +0100, Jinpu Wang wrote:
> > > > On Sun, Mar 21, 2021 at 2:07 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Sat, Mar 20, 2021 at 02:09:50PM +0100, Jack Wang wrote:
> > > > > > Leon Romanovsky <leon@xxxxxxxxxx>于2021年3月20日 周六12:17写道:
> > > > > >
> > > > > > > On Fri, Mar 19, 2021 at 08:44:29AM +0100, Jinpu Wang wrote:
> > > > > > > > Hi Jason and Leon,
> > > > > > > >
> > > > > > > > We recently switch to use upstream OFED from MLNX-OFED, and we notice
> > > > > > > > IPoIB stop working with upstream kernel 5.4.102 with mellanox CX-5
> > > > > > > > HCA, it's working fine on CX-2/CX-3. I tested also on 5.11 kernel it
> > > > > > > > behaves the same.
> > > > > > >
> > > > > > > Are you using "enhanced IPoIB" for CX-5 devices? MLX5_CORE_IPOIB?
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > >  Yes.
> > > > >
> > > > > > Is this expected behavor?
> > > > >
> > > > > Yes, we wanted to make IPoIB behave like any other netdev interfaces and
> > > > > if parent interface isn't enabled, no traffic should pass. More on that,
> > > > > in our internal implementation of enhanced IPoIB, we are reusing same
> > > > > resources for both parent and child, this requires us to wait for "UP"
> > > > > event before allowing traffic.
> > > > >
> > > > > Thanks
> > > > Hi Leon,
> > > >
> > > > Thanks for the clarification, is this behavior documented somewhere?
> > > > is it specific to "enhanced IPoIB" for CX-5?
> > >
> > > It is specific to "enhanced IPoIB" and not to device. I don't know where
> > > we can document it.
> > >
> > > > Will it work differently if without MLX5_CORE_IPOIB enabled?
> > >
> > > Yes, without MLX5_CORE_IPOIB, the devices will work in "legacy IPoIB",
> > > exactly as cx-3. The best thing will be to change IPoIB ULP to behave
> > > like netdev, but we were not comfortable to do it back then due to
> > > user visible nature of such change.
> > >
> > Hi Leon,
> >
> > More testing reveals new problems with MLX5_CORE_IPOIB.
> > w MLX5_CORE_IPOIB, ping wors on both hosts, but iperf3 doens't send any data.

 Just want to give an update, we finally find out the key which leads
to the failure on our side.

we need to set the child interface to same MTU as the parent.
jwang@xxxxxxxxxxxxxx:/mnt/jwang$ ip link list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT group default qlen 1000
    link/ether 0c:c4:7a:ff:07:ce brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether 0c:c4:7a:ff:07:cf brd ff:ff:ff:ff:ff:ff
6: ha_transport: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether f6:ff:16:93:08:8a brd ff:ff:ff:ff:ff:ff
11: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP
mode DEFAULT group default qlen 1024
    link/infiniband
00:00:00:83:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:12 brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
12: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP
mode DEFAULT group default qlen 1024
    link/infiniband
00:00:01:58:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:13 brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
13: ib0.dddd@ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq
state UP mode DEFAULT group default qlen 1024
    link/infiniband
00:00:10:8c:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:12 brd
00:ff:ff:ff:ff:12:40:1b:dd:dd:00:00:00:00:00:00:ff:ff:ff:ff
14: ib1.dddd@ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq
state UP mode DEFAULT group default qlen 1024
    link/infiniband
00:00:11:8c:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:13 brd
00:ff:ff:ff:ff:12:40:1b:dd:dd:00:00:00:00:00:00:ff:ff:ff:ff

Initially, ib0 mtu is 2044, and ib0.dddd is 4092.
After I reduced ib0.dddd mtu to 2044 on both sides, then iperf3 works fine.

Could you explain why mtu must be set to exactly the same in case of
enhanced IPoIB mode? is there anything else we must treat it special?
I guess it related to

> > > > > in our internal implementation of enhanced IPoIB, we are reusing same
> > > > > resources for both parent and child, this requires us to wait for "UP"
> > > > > event before allowing traffic.

Thanks!
Jinpu




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux