Re: Bug#1086520: linux-image-6.11.2-amd64: makes opensm fail to start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 21 Nov 2024 11:04:13 +0100 Uwe Kleine-König wrote:

[...]
> It looks like the commit that is biting you is
> 
> https://git.kernel.org/linus/50660c5197f52b8137e223dc3ba8d43661179a1d
> 
> So if you bisect, try 50660c5197f52b8137e223dc3ba8d43661179a1d and its
> parent 24943dcdc156cf294d97a36bf5c51168bf574c22 first.

I started to bisect.

The first surprise is that 50660c5197f52b8137e223dc3ba8d43661179a1d is
good...   :-o

  $ git checkout 50660c5197f52b8137e223dc3ba8d43661179a1d
  $ make -j 12 my_defconfig bindeb-pkg

  [install and reboot with this kernel version]

  # ls /sys/class/infiniband_mad/ -altrF
  total 0
  drwxr-xr-x 70 root root    0 Nov 25 12:05 ../
  -r--r--r--  1 root root 4096 Nov 25 12:05 abi_version
  lrwxrwxrwx  1 root root    0 Nov 25 12:05 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
  lrwxrwxrwx  1 root root    0 Nov 25 12:05 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
  lrwxrwxrwx  1 root root    0 Nov 25 12:08 issm1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/issm1/
  lrwxrwxrwx  1 root root    0 Nov 25 12:08 issm0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/issm0/
  drwxr-xr-x  2 root root    0 Nov 25 12:08 ./

  [InfiniBand works]

  $ git bisect start
  $ git bisect good
  $ git checkout v6.11
  $ make -j 12 my_defconfig bindeb-pkg

  [install and reboot with this kernel version]

  # ls /sys/class/infiniband_mad/ -altrF
  total 0
  drwxr-xr-x 70 root root    0 Nov 25 12:29 ../
  -r--r--r--  1 root root 4096 Nov 25 12:29 abi_version
  lrwxrwxrwx  1 root root    0 Nov 25 12:29 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/
  lrwxrwxrwx  1 root root    0 Nov 25 12:29 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/
  drwxr-xr-x  2 root root    0 Nov 25 12:30 ./

  [InfiniBand fails, because OpenSM fails to start]

  $ git bisect bad
  Bisecting: 7036 revisions left to test after this (roughly 13 steps)
  [b3ce7a30847a54a7f96a35e609303d8afecd460b] Merge tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernel
  $ make -j 12 my_defconfig bindeb-pkg


Woooha, 13 steps are a lot...

I went on until 10 steps are left:

  [test b3ce7a30847a54a7f96a35e609303d8afecd460b]
  $ git bisect good
  Bisecting: 3385 revisions left to test after this (roughly 12 steps)
  [fbc90c042cd1dc7258ebfebe6d226017e5b5ac8c] Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
  
  [test fbc90c042cd1dc7258ebfebe6d226017e5b5ac8c]
  $ git bisect bad
  Bisecting: 1763 revisions left to test after this (roughly 11 steps)
  [09ea8089abb5d851ce08a9b1a43706e42ef39db2] Merge tag 'staging-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

  [test 09ea8089abb5d851ce08a9b1a43706e42ef39db2]
  $ git bisect bad
  Bisecting: 910 revisions left to test after this (roughly 10 steps)
  [4305ca0087dd99c3c3e0e2ac8a228b7e53a21c78] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi


Since I could not afford to keep the cluster out of service any longer
(each step takes at least 20 or 25 minutes: build + install + reboot +
check InfiniBand), I decided to return the cluster to service.

I will try to continue to bisect by testing the resulting kernels on a
compute node: there's no OpenSM there and it cannot run anyway, if
there's another OpenSM on the same InfiniBand network.
However, I can check whether those issm* symlinks are created in
/sys/class/infiniband_mad/ 
I really hope that this is enough to pinpoint the first bad
commit...

Any better ideas?


-- 
 http://www.inventati.org/frx/
 There's not a second to spare! To the laboratory!
..................................................... Francesco Poli .
 GnuPG key fpr == CA01 1147 9CD2 EFDF FB82  3925 3E1C 27E1 1F69 BFFE

Attachment: pgpx2NJJLRV68.pgp
Description: PGP signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux