Re: mlx5: set_roce_address() / GID add failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks a ton, Mark. That's precisely what the issue was.

-Jonathan

On Tue, Jul 18, 2023 at 10:08 PM Mark Zhang <markzhang@xxxxxxxxxx> wrote:
>
> On 7/19/2023 9:40 AM, Jonathan Nicklin wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > Thanks for the reply and the link. I believe that is a different
> > failure mode involving __ib_cache_gid_add(). In my case, there is no
> > traffic (the link is completely idle). And, the failure mode is
> > persistent no matter how many times I "toggle the link."
> >
> >
> > -Jonathan
> >
> > On Tue, Jul 18, 2023 at 9:28 PM William Kucharski
> > <william.kucharski@xxxxxxxxxx> wrote:
> >>
> >> Yes - it's NVIDIA issue 2326155:
> >>
> >> https://docs.nvidia.com/networking/display/MLNXOFEDv590560113/Known+Issues
> >>
> >> William Kucharski
> >>
> >> On Jul 18, 2023, at 19:06, Jonathan Nicklin <jnicklin@xxxxxxxxxxxxxxx> wrote:
> >>
> >> Hello,
> >>
> >> I've encountered an unexpected error configuring RDMA/ROCEV2 with one of our
> >> 200G ConnectX6 NICS. This issue reproduces consistently on 5.4.249 and 6.4.3.
> >>
> >> dmesg output:
> >>
> >> [    9.863871] mlx5_core 0000:01:00.0: mlx5_cmd_out_err:803:(pid
> >> 1440): SET_ROCE_ADDRESS(0x761) op_mod(0x0) failed, status bad
> >> parameter(0x3), syndrome (0x63c66), err(-22)
> >> [    9.881250] infiniband mlx5_2: add_roce_gid GID add failed port=1 index=0
> >> [    9.889095] __ib_cache_gid_add: unable to add gid
> >> fe80:0000:0000:0000:ad3e:e3ff:fe92:b31b error=-22
> >>
>
> Seems this syndrome indicates it's a multicast source_mac which is not
> allowed. For more information please contact your Nvidia support
> representative, thanks.
>
> >> Device Type:      ConnectX6
> >> Part Number:      MCX653105A-HDA_Ax
> >> Description:      ConnectX-6 VPI adapter card; HDR IB (200Gb/s) and 200GbE ...
> >> PSID:             MT_0000000223
> >> PCI Device Name:  0000:01:00.0
> >>
> >> Firmware is up to date. LINK_TYPE is to ETH(2) and ROCE_CONTROL is
> >> ROCE_ENABLE(2).
> >>
> >> Has anyone seen this syndrome? Any advice or assistance is appreciated.
> >>
> >> Thanks,
> >> -Jonathan
>




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux