RE: regression: nvme rdma with bnxt_re0 broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Selvin,

> -----Original Message-----
> From: Selvin Xavier <selvin.xavier@xxxxxxxxxxxx>
> Sent: Friday, July 12, 2019 9:16 AM
> To: Parav Pandit <parav@xxxxxxxxxxxx>
> Cc: Yi Zhang <yi.zhang@xxxxxxxxxx>; linux-nvme@xxxxxxxxxxxxxxxxxxx; Daniel
> Jurgens <danielj@xxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; Devesh
> Sharma <devesh.sharma@xxxxxxxxxxxx>
> Subject: Re: regression: nvme rdma with bnxt_re0 broken
> 
> On Fri, Jul 12, 2019 at 8:19 AM Parav Pandit <parav@xxxxxxxxxxxx> wrote:
> >
> > Hi Yi Zhang,
> >
> > > -----Original Message-----
> > > From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma-
> > > owner@xxxxxxxxxxxxxxx> On Behalf Of Yi Zhang
> > > Sent: Friday, July 12, 2019 7:23 AM
> > > To: Parav Pandit <parav@xxxxxxxxxxxx>;
> > > linux-nvme@xxxxxxxxxxxxxxxxxxx
> > > Cc: Daniel Jurgens <danielj@xxxxxxxxxxxx>;
> > > linux-rdma@xxxxxxxxxxxxxxx; Devesh Sharma
> > > <devesh.sharma@xxxxxxxxxxxx>; selvin.xavier@xxxxxxxxxxxx
> > > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> > >
> > > Hi Parav
> > >
> > > Here is the info, let me know if it's enough, thanks.
> > >
> > > [root@rdma-perf-07 ~]$ echo -n "module ib_core +p" >
> > > /sys/kernel/debug/dynamic_debug/control
> > > [root@rdma-perf-07 ~]$ ifdown bnxt_roce Device 'bnxt_roce'
> > > successfully disconnected.
> > > [root@rdma-perf-07 ~]$ ifup bnxt_roce Connection successfully
> > > activated (D-Bus active path:
> > > /org/freedesktop/NetworkManager/ActiveConnection/16)
> > > [root@rdma-perf-07 ~]$ sh a.sh
> > > DEV    PORT    INDEX    GID                    IPv4         VER DEV
> > > ---    ----    -----    ---                    ------------ ---    ---
> > > bnxt_re0    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v1    bnxt_roce
> > > bnxt_re0    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v2    bnxt_roce
> > > bnxt_re0    1    10    0000:0000:0000:0000:0000:ffff:ac1f:2bbb
> > > 172.31.43.187     v1    bnxt_roce.43
> > > bnxt_re0    1    11    0000:0000:0000:0000:0000:ffff:ac1f:2bbb
> > > 172.31.43.187     v2    bnxt_roce.43
> > > bnxt_re0    1    2    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v1    bnxt_roce.45
> > > bnxt_re0    1    3    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v2    bnxt_roce.45
> > > bnxt_re0    1    4    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v1    bnxt_roce.43
> > > bnxt_re0    1    5    fe80:0000:0000:0000:020a:f7ff:fee3:6e32
> > > v2    bnxt_roce.43
> > > bnxt_re0    1    6    0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > 172.31.40.187     v1    bnxt_roce
> > > bnxt_re0    1    7    0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > 172.31.40.187     v2    bnxt_roce
> > > bnxt_re0    1    8    0000:0000:0000:0000:0000:ffff:ac1f:2dbb
> > > 172.31.45.187     v1    bnxt_roce.45
> > > bnxt_re0    1    9    0000:0000:0000:0000:0000:ffff:ac1f:2dbb
> > > 172.31.45.187     v2    bnxt_roce.45
> > > bnxt_re1    1    0    fe80:0000:0000:0000:020a:f7ff:fee3:6e33
> > > v1    lom_2
> > > bnxt_re1    1    1    fe80:0000:0000:0000:020a:f7ff:fee3:6e33
> > > v2    lom_2
> > > cxgb4_0    1    0    0007:433b:f5b0:0000:0000:0000:0000:0000         v1
> > > cxgb4_0    2    0    0007:433b:f5b8:0000:0000:0000:0000:0000         v1
> > > hfi1_0    1    0    fe80:0000:0000:0000:0011:7501:0109:6c60     v1
> > > hfi1_0    1    1    fe80:0000:0000:0000:0006:6a00:0000:0005     v1
> > > mlx5_0    1    0    fe80:0000:0000:0000:506b:4b03:00f3:8a38     v1
> > > n_gids_found=19
> > >
> > > [root@rdma-perf-07 ~]$ dmesg | tail -15
> > > [   19.744421] IPv6: ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8002: link
> > > becomes ready [   19.758371] IPv6: ADDRCONF(NETDEV_CHANGE):
> > > mlx5_ib0.8004: link becomes ready [   20.010469] hfi1 0000:d8:00.0: hfi1_0:
> > > Switching to NO_DMA_RTAIL [   20.440580] IPv6:
> > > ADDRCONF(NETDEV_CHANGE): mlx5_ib0.8006: link becomes ready
> > > [   21.098510] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes
> > > requested: 8. Max supported is 2.
> > > [   21.324341] bnxt_en 0000:19:00.0 bnxt_roce: Too many traffic classes
> > > requested: 8. Max supported is 2.
> > > [   22.058647] IPv6: ADDRCONF(NETDEV_CHANGE): hfi1_opa0: link
> becomes
> > > ready [  211.407329] _ib_cache_gid_del: can't delete gid
> > > fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22 [  211.407334]
> > > _ib_cache_gid_del: can't delete gid
> > > fe80:0000:0000:0000:020a:f7ff:fee3:6e32 error=-22 [  211.425275]
> > > infiniband
> > > bnxt_re0: del_gid port=1 index=6 gid
> > > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > [  211.425280] infiniband bnxt_re0: free_gid_entry_locked port=1
> > > index=6 gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > [  211.425292] infiniband bnxt_re0: del_gid port=1 index=7 gid
> > > 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > [  211.425461] infiniband bnxt_re0: free_gid_entry_locked port=1
> > > index=7 gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > [  225.474061] infiniband bnxt_re0: store_gid_entry port=1 index=6
> > > gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > > [  225.474075] infiniband bnxt_re0: store_gid_entry port=1 index=7
> > > gid 0000:0000:0000:0000:0000:ffff:ac1f:28bb
> > >
> > >
> > GID table looks fine.
> >
> The GID table has  fe80:0000:0000:0000:020a:f7ff:fee3:6e32 entry repeated 6
> times. 2 for each interface bnxt_roce, bnxt_roce.43 and bnxt_roce.45. Is this
> expected to have same gid entries for vlan and base interfaces? As you
> mentioned earlier, driver's assumption that only 2 GID entries identical (one for
> RoCE v1 and one for RoCE
> v2)   is breaking here.
> 
Yes, this is correct behavior. Each vlan netdev interface is in different L2 segment.
Vlan netdev has this ipv6 link local address. Hence, it is added to the GID table.
A given GID table entry is identified uniquely by GID+ndev+type(v1/v2).

Reviewing bnxt_qplib_add_sgid() does the comparison below.
if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {

This comparison looks incomplete which ignore netdev and type.
But even with that, I would expect GID entry addition failure for vlans for ipv6 link local entries.

But I am puzzled now, that , with above memcmp() check, how does both v1 and v2 entries get added in your table and for vlans too?
I expect add_gid() and core/cache.c add_roce_gid () to fail for the duplicate entry.
But GID table that Yi Zhang dumped has these vlan entries.
I am missing something.

Yi Zhang,
Instead of last 15 lines of dmesg, can you please share the whole dmsg log which should be enabled before creating vlans.
using
echo -n "module ib_core +p" /sys/kernel/debug/dynamic_debug/control

Selvin,
Additionally, driver shouldn't be looking at the duplicate entries. core already does it.
You might only want to do for v1/v2 case as bnxt driver has some dependency with it.
Can you please fix this part?

> > > On 7/12/19 12:18 AM, Parav Pandit wrote:
> > > > Sagi,
> > > >
> > > > This is better one to cc to linux-rdma.
> > > >
> > > > + Devesh, Selvin.
> > > >
> > > >> -----Original Message-----
> > > >> From: Parav Pandit
> > > >> Sent: Thursday, July 11, 2019 6:25 PM
> > > >> To: Yi Zhang <yi.zhang@xxxxxxxxxx>;
> > > >> linux-nvme@xxxxxxxxxxxxxxxxxxx
> > > >> Cc: Daniel Jurgens <danielj@xxxxxxxxxxxx>
> > > >> Subject: RE: regression: nvme rdma with bnxt_re0 broken
> > > >>
> > > >> Hi Yi Zhang,
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: Yi Zhang <yi.zhang@xxxxxxxxxx>
> > > >>> Sent: Thursday, July 11, 2019 3:17 PM
> > > >>> To: linux-nvme@xxxxxxxxxxxxxxxxxxx
> > > >>> Cc: Daniel Jurgens <danielj@xxxxxxxxxxxx>; Parav Pandit
> > > >>> <parav@xxxxxxxxxxxx>
> > > >>> Subject: regression: nvme rdma with bnxt_re0 broken
> > > >>>
> > > >>> Hello
> > > >>>
> > > >>> 'nvme connect' failed when use bnxt_re0 on latest upstream
> > > >>> build[1], by bisecting I found it was introduced from v5.2.0-rc1
> > > >>> with [2], it works after I revert it.
> > > >>> Let me know if you need more info, thanks.
> > > >>>
> > > >>> [1]
> > > >>> [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s
> > > >>> 4420 -n testnqn Failed to write to /dev/nvme-fabrics: Bad
> > > >>> address
> > > >>>
> > > >>> [root@rdma-perf-07 ~]$ dmesg
> > > >>> [  476.320742] bnxt_en 0000:19:00.0: QPLIB: cmdq[0x4b9]=0x15
> > > >>> status 0x5
> >
> > Devesh, Selvin,
> >
> > What does this error mean?
> > bnxt_qplib_create_ah() is failing.
> >
> We are passing a wrong index for the GID to FW because of the assumption
> mentioned earlier.
> FW is failing command due to this.
> 
> > > >>> [ 476.327103] infiniband bnxt_re0: Failed to allocate HW AH [
> > > >>> 476.332525] nvme nvme2: rdma_connect failed (-14).
> > > >>> [  476.343552] nvme nvme2: rdma connection establishment failed
> > > >>> (-14)
> > > >>>
> > > >>> [root@rdma-perf-07 ~]$ lspci  | grep -i Broadcom
> > > >>> 01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries
> > > >>> NetXtreme
> > > >>> BCM5720 2-port Gigabit Ethernet PCIe
> > > >>> 01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries
> > > >>> NetXtreme
> > > >>> BCM5720 2-port Gigabit Ethernet PCIe
> > > >>> 18:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008
> > > >>> [Fury] (rev
> > > >>> 02)
> > > >>> 19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries
> > > >>> BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> > > >>> 19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries
> > > >>> BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
> > > >>>
> > > >>>
> > > >>> [2]
> > > >>> commit 823b23da71132b80d9f41ab667c68b112455f3b6
> > > >>> Author: Parav Pandit <parav@xxxxxxxxxxxx>
> > > >>> Date:   Wed Apr 10 11:23:03 2019 +0300
> > > >>>
> > > >>>      IB/core: Allow vlan link local address based RoCE GIDs
> > > >>>
> > > >>>      IPv6 link local address for a VLAN netdevice has nothing to do with
> its
> > > >>>      resemblance with the default GID, because VLAN link local GID is in
> > > >>>      different layer 2 domain.
> > > >>>
> > > >>>      Now that RoCE MAD packet processing and route resolution
> > > >>> consider
> > > the
> > > >>>      right GID index, there is no need for an unnecessary check
> > > >>> which
> > > prevents
> > > >>>      the addition of vlan based IPv6 link local GIDs.
> > > >>>
> > > >>>      Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx>
> > > >>>      Reviewed-by: Daniel Jurgens <danielj@xxxxxxxxxxxx>
> > > >>>      Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > > >>>      Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx>
> > > >>>
> > > >>>
> > > >>>
> > > >>> Best Regards,
> > > >>>    Yi Zhang
> > > >>>
> > > >> I need some more information from you to debug this issue as I
> > > >> don’t have the hw.
> > > >> The highlighted patch added support for IPv6 link local address
> > > >> for vlan. I am unsure how this can affect IPv4 AH creation for
> > > >> which there is
> > > failure.
> > > >>
> > > >> 1. Before you assign the IP address to the netdevice, Please do,
> > > >> echo -n "module ib_core +p" >
> > > >> /sys/kernel/debug/dynamic_debug/control
> > > >>
> > > >> Please share below output before doing nvme connect.
> > > >> 2. Output of script [1]
> > > >> $ show_gids script
> > > >> If getting this script is problematic, share the output of,
> > > >>
> > > >> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gids/*
> > > >> $ cat /sys/class/infiniband/bnxt_re0/ports/1/gid_attrs/ndevs/*
> > > >> $ ip link show
> > > >> $ip addr show
> > > >> $ dmesg
> > > >>
> > > >> [1]
> > > >> https://community.mellanox.com/s/article/understanding-show-gids-
> > > >> script#jive_content_id_The_Script
> > > >>
> > > >> I suspect that driver's assumption about GID indices might have
> > > >> gone wrong here in drivers/infiniband/hw/bnxt_re/ib_verbs.c.
> > > >> Lets see about results to confirm that.
> > > > _______________________________________________
> > > > Linux-nvme mailing list
> > > > Linux-nvme@xxxxxxxxxxxxxxxxxxx
> > > > http://lists.infradead.org/mailman/listinfo/linux-nvme




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux