Re: mlx5: set_roce_address() / GID add failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the reply and the link. I believe that is a different
failure mode involving __ib_cache_gid_add(). In my case, there is no
traffic (the link is completely idle). And, the failure mode is
persistent no matter how many times I "toggle the link."


-Jonathan

On Tue, Jul 18, 2023 at 9:28 PM William Kucharski
<william.kucharski@xxxxxxxxxx> wrote:
>
> Yes - it's NVIDIA issue 2326155:
>
> https://docs.nvidia.com/networking/display/MLNXOFEDv590560113/Known+Issues
>
> William Kucharski
>
> On Jul 18, 2023, at 19:06, Jonathan Nicklin <jnicklin@xxxxxxxxxxxxxxx> wrote:
>
> Hello,
>
> I've encountered an unexpected error configuring RDMA/ROCEV2 with one of our
> 200G ConnectX6 NICS. This issue reproduces consistently on 5.4.249 and 6.4.3.
>
> dmesg output:
>
> [    9.863871] mlx5_core 0000:01:00.0: mlx5_cmd_out_err:803:(pid
> 1440): SET_ROCE_ADDRESS(0x761) op_mod(0x0) failed, status bad
> parameter(0x3), syndrome (0x63c66), err(-22)
> [    9.881250] infiniband mlx5_2: add_roce_gid GID add failed port=1 index=0
> [    9.889095] __ib_cache_gid_add: unable to add gid
> fe80:0000:0000:0000:ad3e:e3ff:fe92:b31b error=-22
>
> Device Type:      ConnectX6
> Part Number:      MCX653105A-HDA_Ax
> Description:      ConnectX-6 VPI adapter card; HDR IB (200Gb/s) and 200GbE ...
> PSID:             MT_0000000223
> PCI Device Name:  0000:01:00.0
>
> Firmware is up to date. LINK_TYPE is to ETH(2) and ROCE_CONTROL is
> ROCE_ENABLE(2).
>
> Has anyone seen this syndrome? Any advice or assistance is appreciated.
>
> Thanks,
> -Jonathan




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux