Re: Unreliable disk detection order in 5.x

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2021/11/05 15:46, Simon Kirby wrote:
> I'm seeing disk detection order changing across reboots on 5.x kernels
> (5.4, 5.10, 5.14), but not 4.9, 4.14, 4.19, with megaraid_sas (Dell
> PERC_H700). With 13 disks and 5.14.14, the order changes almost always.
> 
> I did initially try to bisect this issue, but it seems to become more
> rare in earlier kernels, and there are some non-booting problems between
> 4.x and 5.x.
> 
> The most common effect is swapping of sda with sdb, or two neighboring
> devices in the list; for example:
> 
> # diff -u lsblk-S-5.10.0 lsblk-S-5.10.0-2
> --- lsblk-S-5.10.0      2021-11-04 15:23:23.767008360 -0400
> +++ lsblk-S-5.10.0-2    2021-11-04 17:34:37.748310196 -0400
> @@ -1,6 +1,6 @@
>  NAME HCTL       TYPE VENDOR   MODEL      REV TRAN
> -sda  0:2:0:0    disk DELL     PERC_H700 2.10
> -sdb  0:2:2:0    disk DELL     PERC_H700 2.10
> +sda  0:2:2:0    disk DELL     PERC_H700 2.10
> +sdb  0:2:0:0    disk DELL     PERC_H700 2.10
>  sdc  0:2:3:0    disk DELL     PERC_H700 2.10
>  sdd  0:2:4:0    disk DELL     PERC_H700 2.10
>  sde  0:2:5:0    disk DELL     PERC_H700 2.10
> 
> This is happening on vendor (Debian 5.10.0) and home-built kernels, and
> on a variety of hosts. On all kernels, the detection printks come up in
> an interesting order, but in older kernels, it always ends up with an
> sd-name that is ordered by SCSI ID ascending:
> 
> [    2.289776] sd 0:2:0:0: [sda] 999030784 512-byte logical blocks: (512 GB/476 GiB)
> [    2.289918] sd 0:2:4:0: [sdd] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB)
> [    2.289947] sd 0:2:3:0: [sdc] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB)
> [    2.290032] sd 0:2:6:0: [sdf] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB)
> [    2.290210] sd 0:2:7:0: [sdg] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB)
> [    2.290248] sd 0:2:9:0: [sdi] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB)
> [    2.290323] sd 0:2:2:0: [sdb] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB)
> [    2.290461] sd 0:2:5:0: [sde] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB)
> [    2.290476] sd 0:2:8:0: [sdh] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB)
> 
> Full "dmesg" is saved here: https://0x.ca/sim/ref/5.10.0/dmesg
> 
> Any ideas on suggestions on what I could use to track down what changed
> here, or ideas on what might have influenced it?

Most distro kernels are now compiled with asynchronous device scan enabled to
speedup the boot process. This potentially result in the device names changing
across reboots. Reliable device names are provided by udev under
/dev/disk/by-id, by-uuid etc.

You can turn off scsi asynchronous device scan using the scsi_mod.scan=sync
kernel boot argument, or disable the CONFIG_SCSI_SCAN_ASYNC option for your
kernel (device drivers -> scsi device support -> asynchronous scsi scanning).

But even with synchronous scanning, device names are not reliable and there are
no guarantees that one particular device will always have the same name.



-- 
Damien Le Moal
Western Digital Research



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux