On 2021/11/05 15:46, Simon Kirby wrote: > I'm seeing disk detection order changing across reboots on 5.x kernels > (5.4, 5.10, 5.14), but not 4.9, 4.14, 4.19, with megaraid_sas (Dell > PERC_H700). With 13 disks and 5.14.14, the order changes almost always. > > I did initially try to bisect this issue, but it seems to become more > rare in earlier kernels, and there are some non-booting problems between > 4.x and 5.x. > > The most common effect is swapping of sda with sdb, or two neighboring > devices in the list; for example: > > # diff -u lsblk-S-5.10.0 lsblk-S-5.10.0-2 > --- lsblk-S-5.10.0 2021-11-04 15:23:23.767008360 -0400 > +++ lsblk-S-5.10.0-2 2021-11-04 17:34:37.748310196 -0400 > @@ -1,6 +1,6 @@ > NAME HCTL TYPE VENDOR MODEL REV TRAN > -sda 0:2:0:0 disk DELL PERC_H700 2.10 > -sdb 0:2:2:0 disk DELL PERC_H700 2.10 > +sda 0:2:2:0 disk DELL PERC_H700 2.10 > +sdb 0:2:0:0 disk DELL PERC_H700 2.10 > sdc 0:2:3:0 disk DELL PERC_H700 2.10 > sdd 0:2:4:0 disk DELL PERC_H700 2.10 > sde 0:2:5:0 disk DELL PERC_H700 2.10 > > This is happening on vendor (Debian 5.10.0) and home-built kernels, and > on a variety of hosts. On all kernels, the detection printks come up in > an interesting order, but in older kernels, it always ends up with an > sd-name that is ordered by SCSI ID ascending: > > [ 2.289776] sd 0:2:0:0: [sda] 999030784 512-byte logical blocks: (512 GB/476 GiB) > [ 2.289918] sd 0:2:4:0: [sdd] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB) > [ 2.289947] sd 0:2:3:0: [sdc] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB) > [ 2.290032] sd 0:2:6:0: [sdf] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB) > [ 2.290210] sd 0:2:7:0: [sdg] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB) > [ 2.290248] sd 0:2:9:0: [sdi] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB) > [ 2.290323] sd 0:2:2:0: [sdb] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB) > [ 2.290461] sd 0:2:5:0: [sde] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB) > [ 2.290476] sd 0:2:8:0: [sdh] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB) > > Full "dmesg" is saved here: https://0x.ca/sim/ref/5.10.0/dmesg > > Any ideas on suggestions on what I could use to track down what changed > here, or ideas on what might have influenced it? Most distro kernels are now compiled with asynchronous device scan enabled to speedup the boot process. This potentially result in the device names changing across reboots. Reliable device names are provided by udev under /dev/disk/by-id, by-uuid etc. You can turn off scsi asynchronous device scan using the scsi_mod.scan=sync kernel boot argument, or disable the CONFIG_SCSI_SCAN_ASYNC option for your kernel (device drivers -> scsi device support -> asynchronous scsi scanning). But even with synchronous scanning, device names are not reliable and there are no guarantees that one particular device will always have the same name. -- Damien Le Moal Western Digital Research