On Wed, Dec 04, 2024 at 06:13:56PM +0100, Francesco Poli wrote: > On Wed, 4 Dec 2024 17:37:05 +0100 Uwe Kleine-König wrote: > > > Hello Francesco, > > Hello Uwe, > > [...] > > I wonder if you could test a firmware upgrade or the above patch. Would > > be nice to know if there are still some things to do for us (= Debian > > kernel team) here. > > Yes, I've finally got around to upgrading the firmware. > > And today I had a time window, where I could reboot the cluster head > node. > After the reboot, the InfiniBand network works correctly: > > $ uname -v > #1 SMP PREEMPT_DYNAMIC Debian 6.11.10-1 (2024-11-23) > $ ls -altrF /sys/class/infiniband_mad/ > total 0 > lrwxrwxrwx 1 root root 0 Dec 4 10:15 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/ > lrwxrwxrwx 1 root root 0 Dec 4 10:15 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/ > drwxr-xr-x 2 root root 0 Dec 4 10:17 ./ > drwxr-xr-x 73 root root 0 Dec 4 10:17 ../ > -r--r--r-- 1 root root 4096 Dec 4 10:17 abi_version > lrwxrwxrwx 1 root root 0 Dec 4 18:08 issm1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/issm1/ > lrwxrwxrwx 1 root root 0 Dec 4 18:08 issm0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/issm0/ > # ethtool -i ibp129s0f0 > driver: mlx5_core[ib_ipoib] > version: 6.11.10-amd64 > firmware-version: 20.43.1014 (MT_0000000224) > expansion-rom-version: > bus-info: 0000:81:00.0 > supports-statistics: yes > supports-test: yes > supports-eeprom-access: no > supports-register-dump: no > supports-priv-flags: yes > # ethtool -i ibp129s0f1 > driver: mlx5_core[ib_ipoib] > version: 6.11.10-amd64 > firmware-version: 20.43.1014 (MT_0000000224) > expansion-rom-version: > bus-info: 0000:81:00.1 > supports-statistics: yes > supports-test: yes > supports-eeprom-access: no > supports-register-dump: no > supports-priv-flags: yes > $ ps aux | grep opens[m] > root 1150 0.0 0.0 1560776 3636 ? Ssl 10:15 0:00 /usr/sbin/opensm --guid 0x9c63c00300033240 --log_file /var/log/opensm.0x9c63c00300033240.log > > > > > > If everything is fine for you, I'd like to close this bug. > > I am closing the Debian bug report right now. > Thanks to everyone who has been involved for the great and kind help! Thanks a lot for your help. You helped a lot. BTW, we have an official fix [1], but it wasn't sent yet as we want to finish all various tests first (E2E, QA e.t.c). [1] https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/commit/?h=rdma-next&id=09754c1e5d0d204747928290cc8c6f4371fd4c6a > > > > > Best regards > > Have a nice evening. :-) > > -- > http://www.inventati.org/frx/ > There's not a second to spare! To the laboratory! > ..................................................... Francesco Poli . > GnuPG key fpr == CA01 1147 9CD2 EFDF FB82 3925 3E1C 27E1 1F69 BFFE