Re: Missing infiniband network interfaces after update to 5.14/5.15

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 11, 2021 at 12:29 PM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
>
> On Thu, Nov 11, 2021 at 08:48:08AM +0100, Jinpu Wang wrote:
> > Hi Jason, hi Leon,
> >
> > We are seeing exactly the same error reported here:
> > https://bugzilla.redhat.com/show_bug.cgi?id=2014094
> >
> > I suspect it's related to
> > https://lore.kernel.org/all/cover.1623427137.git.leonro@xxxxxxxxxx/
> >
> > Do you have any idea, what goes wrong?
>
> I can't reproduce it with latest Fedora 34 RPM, which I downloaded from here
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1851842
>
> and also with kernel-5.14.7-200.fc34.x86_64 version mentioned in the bug
> report.
>
> [leonro@c-235-8-1-005 ~]$ uname -a
> Linux c-235-8-1-005 5.14.7-200.fc34.x86_64 #1 SMP Wed Sep 22 14:54:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> [leonro@c-235-8-1-005 ~]$ rdma dev
> 0: ibp8s0f0: node_type ca fw 2.42.5000 node_guid 1c34:da03:0007:7950 sys_image_guid 1c34:da03:0007:7953
> 1: ibp9s0f0: node_type ca fw 2.42.5000 node_guid 1c34:da03:0007:7a60 sys_image_guid 1c34:da03:0007:7a63
>
> [leonro@c-235-8-1-005 ~]$ uname -a
> Linux c-235-8-1-005 5.14.16-201.fc34.x86_64 #1 SMP Wed Nov 3 13:57:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> [leonro@c-235-8-1-005 ~]$ rdma dev
> 0: ibp8s0f0: node_type ca fw 2.42.5000 node_guid 1c34:da03:0007:7950 sys_image_guid 1c34:da03:0007:7953
> 1: ibp9s0f0: node_type ca fw 2.42.5000 node_guid 1c34:da03:0007:7a60 sys_image_guid 1c34:da03:0007:7a63
> [leonro@c-235-8-1-005 ~]$ lspci |grep nox
> 08:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
> 09:00.0 Network controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
>
> Thanks
>
Hi,

I tried different host with CX-3/CX-5, they all work fine. and I can
only reproduce on hosts with a bit old HCA:
03:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe
2.0 5GT/s - IB QDR / 10GigE] (rev b0)

The bug report link
https://bugzilla.redhat.com/show_bug.cgi?id=2014094, mentioned HCA
ConnectX too.

01:00.0 InfiniBand [0c06]: Mellanox Technologies MT25408A0-FCC-GI
ConnectX, Dual Port 20Gb/s InfiniBand / 10GigE Adapter IC with PCIe
2.0 x8 5.0GT/s In... (rev b0)
with the instrument, I only narrow it down to
1438                 port = setup_port(coredev, port_num, &attr);
1439                 if (IS_ERR(port)) {
1440                         ret = PTR_ERR(port);
1441                         pr_info("setup ports failed %d\n", ret);
1442                         goto err_put;
1443                 }

[   43.795268] <mlx4_ib> mlx4_ib_add: counter index 1 for port 2 allocated 0
[   43.830809] setup ports failed -12
[   43.830814] infiniband mlx4_0: Couldn't register device with driver model

My guess is the ConnectX HCA may be missing some features, which leads
to ENOMEM, I will continue the instrument if no other hint.

Thanks



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux