Thanks a ton, Mark. That's precisely what the issue was. -Jonathan On Tue, Jul 18, 2023 at 10:08 PM Mark Zhang <markzhang@xxxxxxxxxx> wrote: > > On 7/19/2023 9:40 AM, Jonathan Nicklin wrote: > > External email: Use caution opening links or attachments > > > > > > Thanks for the reply and the link. I believe that is a different > > failure mode involving __ib_cache_gid_add(). In my case, there is no > > traffic (the link is completely idle). And, the failure mode is > > persistent no matter how many times I "toggle the link." > > > > > > -Jonathan > > > > On Tue, Jul 18, 2023 at 9:28 PM William Kucharski > > <william.kucharski@xxxxxxxxxx> wrote: > >> > >> Yes - it's NVIDIA issue 2326155: > >> > >> https://docs.nvidia.com/networking/display/MLNXOFEDv590560113/Known+Issues > >> > >> William Kucharski > >> > >> On Jul 18, 2023, at 19:06, Jonathan Nicklin <jnicklin@xxxxxxxxxxxxxxx> wrote: > >> > >> Hello, > >> > >> I've encountered an unexpected error configuring RDMA/ROCEV2 with one of our > >> 200G ConnectX6 NICS. This issue reproduces consistently on 5.4.249 and 6.4.3. > >> > >> dmesg output: > >> > >> [ 9.863871] mlx5_core 0000:01:00.0: mlx5_cmd_out_err:803:(pid > >> 1440): SET_ROCE_ADDRESS(0x761) op_mod(0x0) failed, status bad > >> parameter(0x3), syndrome (0x63c66), err(-22) > >> [ 9.881250] infiniband mlx5_2: add_roce_gid GID add failed port=1 index=0 > >> [ 9.889095] __ib_cache_gid_add: unable to add gid > >> fe80:0000:0000:0000:ad3e:e3ff:fe92:b31b error=-22 > >> > > Seems this syndrome indicates it's a multicast source_mac which is not > allowed. For more information please contact your Nvidia support > representative, thanks. > > >> Device Type: ConnectX6 > >> Part Number: MCX653105A-HDA_Ax > >> Description: ConnectX-6 VPI adapter card; HDR IB (200Gb/s) and 200GbE ... > >> PSID: MT_0000000223 > >> PCI Device Name: 0000:01:00.0 > >> > >> Firmware is up to date. LINK_TYPE is to ETH(2) and ROCE_CONTROL is > >> ROCE_ENABLE(2). > >> > >> Has anyone seen this syndrome? Any advice or assistance is appreciated. > >> > >> Thanks, > >> -Jonathan >