Re: Bug#1086520: linux-image-6.11.2-amd64: makes opensm fail to start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Francesco,

[for the new-comers: This is about a regression in 6.11. Details
available at https://bugs.debian.org/1086520. The TL;DR; is that on
6.10.11 opensm works as expected, while it fails to start on 6.11.7.]

On Mon, Nov 18, 2024 at 08:06:16PM +0100, Francesco Poli wrote:
> On Mon, 18 Nov 2024 09:58:03 +0100 Uwe Kleine-König wrote:
> 
> [...]
> > On Wed, Nov 13, 2024 at 11:15:03PM +0100, Francesco Poli wrote:
> > > On Mon, 11 Nov 2024 11:22:26 +0100 Uwe Kleine-König wrote:
> [...]
> > > > I guess the kernel provides a directory "/sys/class/infiniband_mad". Do
> > > > its contents look different on 6.10.x and 6.11.x?
> > > 
> > > I will look into this as soon as I can reboot the cluster head node.
> 
> I looked into this, while testing the new Debian Linux kernel that has
> just migrated to testing (which, once again, makes opensm fail to
> start, just like other 6.11.x versions).
> 
> With a working kernel:
> 
>   $ uname -v
>   #1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1 (2024-09-22)
>   $ ls -altrF /sys/class/infiniband_mad/
>   total 0
>   lrwxrwxrwx  1 root root    0 Nov  4 15:58 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
>   lrwxrwxrwx  1 root root    0 Nov  4 15:58 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
>   lrwxrwxrwx  1 root root    0 Nov 11 15:54 issm1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/issm1/
>   lrwxrwxrwx  1 root root    0 Nov 11 15:54 issm0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/issm0/
>   drwxr-xr-x  2 root root    0 Nov 11 15:54 ./
>   drwxr-xr-x 72 root root    0 Nov 11 15:54 ../
>   -r--r--r--  1 root root 4096 Nov 11 15:54 abi_version
>   $ cat /sys/class/infiniband_mad/abi_version 
>   5
> 
> With a kernel that makes opensm fail to start:
> 
>   $ uname -v
>   #1 SMP PREEMPT_DYNAMIC Debian 6.11.7-1 (2024-11-09)
>   $ ls -altrF /sys/class/infiniband_mad/
>   total 0
>   drwxr-xr-x 73 root root    0 Nov 18 09:41 ../
>   -r--r--r--  1 root root 4096 Nov 18 09:41 abi_version
>   lrwxrwxrwx  1 root root    0 Nov 18 09:41 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
>   lrwxrwxrwx  1 root root    0 Nov 18 09:41 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
>   drwxr-xr-x  2 root root    0 Nov 18 09:43 ./
>   $ cat /sys/class/infiniband_mad/abi_version
>   5
> 
> As you can see, a couple of files (symlinks) are missing here...

It looks like the commit that is biting you is

https://git.kernel.org/linus/50660c5197f52b8137e223dc3ba8d43661179a1d

So if you bisect, try 50660c5197f52b8137e223dc3ba8d43661179a1d and its
parent 24943dcdc156cf294d97a36bf5c51168bf574c22 first.

I don't know about infiniband, but I'd say: Either your machine doesn't
have these issmX devices and opensm should cope with that, or these
issmX devices are available then
50660c5197f52b8137e223dc3ba8d43661179a1d is buggy.

> Does this ring a bell?

It doesn't for me, but maybe Mark Zhang or someone else among the new
recipients has an idea?

Best regards
Uwe

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux