On Thu, 21 Nov 2024 11:04:13 +0100 Uwe Kleine-König wrote: [...] > It looks like the commit that is biting you is > > https://git.kernel.org/linus/50660c5197f52b8137e223dc3ba8d43661179a1d > > So if you bisect, try 50660c5197f52b8137e223dc3ba8d43661179a1d and its > parent 24943dcdc156cf294d97a36bf5c51168bf574c22 first. I started to bisect. The first surprise is that 50660c5197f52b8137e223dc3ba8d43661179a1d is good... :-o $ git checkout 50660c5197f52b8137e223dc3ba8d43661179a1d $ make -j 12 my_defconfig bindeb-pkg [install and reboot with this kernel version] # ls /sys/class/infiniband_mad/ -altrF total 0 drwxr-xr-x 70 root root 0 Nov 25 12:05 ../ -r--r--r-- 1 root root 4096 Nov 25 12:05 abi_version lrwxrwxrwx 1 root root 0 Nov 25 12:05 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/ lrwxrwxrwx 1 root root 0 Nov 25 12:05 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/ lrwxrwxrwx 1 root root 0 Nov 25 12:08 issm1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/issm1/ lrwxrwxrwx 1 root root 0 Nov 25 12:08 issm0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/issm0/ drwxr-xr-x 2 root root 0 Nov 25 12:08 ./ [InfiniBand works] $ git bisect start $ git bisect good $ git checkout v6.11 $ make -j 12 my_defconfig bindeb-pkg [install and reboot with this kernel version] # ls /sys/class/infiniband_mad/ -altrF total 0 drwxr-xr-x 70 root root 0 Nov 25 12:29 ../ -r--r--r-- 1 root root 4096 Nov 25 12:29 abi_version lrwxrwxrwx 1 root root 0 Nov 25 12:29 umad0 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.0/infiniband_mad/umad0/ lrwxrwxrwx 1 root root 0 Nov 25 12:29 umad1 -> ../../devices/pci0000:80/0000:80:01.1/0000:81:00.1/infiniband_mad/umad1/ drwxr-xr-x 2 root root 0 Nov 25 12:30 ./ [InfiniBand fails, because OpenSM fails to start] $ git bisect bad Bisecting: 7036 revisions left to test after this (roughly 13 steps) [b3ce7a30847a54a7f96a35e609303d8afecd460b] Merge tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernel $ make -j 12 my_defconfig bindeb-pkg Woooha, 13 steps are a lot... I went on until 10 steps are left: [test b3ce7a30847a54a7f96a35e609303d8afecd460b] $ git bisect good Bisecting: 3385 revisions left to test after this (roughly 12 steps) [fbc90c042cd1dc7258ebfebe6d226017e5b5ac8c] Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm [test fbc90c042cd1dc7258ebfebe6d226017e5b5ac8c] $ git bisect bad Bisecting: 1763 revisions left to test after this (roughly 11 steps) [09ea8089abb5d851ce08a9b1a43706e42ef39db2] Merge tag 'staging-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging [test 09ea8089abb5d851ce08a9b1a43706e42ef39db2] $ git bisect bad Bisecting: 910 revisions left to test after this (roughly 10 steps) [4305ca0087dd99c3c3e0e2ac8a228b7e53a21c78] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Since I could not afford to keep the cluster out of service any longer (each step takes at least 20 or 25 minutes: build + install + reboot + check InfiniBand), I decided to return the cluster to service. I will try to continue to bisect by testing the resulting kernels on a compute node: there's no OpenSM there and it cannot run anyway, if there's another OpenSM on the same InfiniBand network. However, I can check whether those issm* symlinks are created in /sys/class/infiniband_mad/ I really hope that this is enough to pinpoint the first bad commit... Any better ideas? -- http://www.inventati.org/frx/ There's not a second to spare! To the laboratory! ..................................................... Francesco Poli . GnuPG key fpr == CA01 1147 9CD2 EFDF FB82 3925 3E1C 27E1 1F69 BFFE
Attachment:
pgpx2NJJLRV68.pgp
Description: PGP signature