Re: Unreliable disk detection order in 5.x

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2021/11/11 10:01, Simon Kirby wrote:
> On Sun, Nov 07, 2021 at 11:51:45AM -0800, Bart Van Assche wrote:
> 
>> On 11/6/21 19:24, Simon Kirby wrote:
>>> This occurs regardless of the CONFIG_SCSI_SCAN_ASYNC setting, and
>>> also with scsi_mod.scan=sync on vendor kernels. All of these disks
>>> are coming from the same driver and card.
>>>
>>> I understand that using UUIDs, by-id, etc., is an option to work
>>> around this, but then we would have to push IDs for disks in every
>>> server to our configuration management. It does not seem that this
>>> change is really intentional.
>>
>> SCSI disk detection is asynchronous on purpose since a long time. The most
>> recent commit I know of that changed SCSI disk scanning
>> behavior is commit f049cf1a7b67 ("scsi: sd: Rely on the driver core for
>> asynchronous probing").
>>
>> Please use one of the /dev/disk/by-*/* identifiers as Damien requested.
> 
> Hi Bart,
> 
> So, we're using DRBD on top of these, which means by-uuid is not
> available; we can use only by-id and by-path. by-id is dependent on disk
> models and serial numbers, and by-path is dependent on PCI bus details.
> Both are going to be a good deal more work to maintain, since they're
> both not just a simple enumeration.
> 
> I did try 5.14.17 with f049cf1a7b67 (and a065c0faacb1) reverted, and it
> does indeed restore the behaviour where sd* order appears to be reliable.
> Scan time (time until systemd starts) is within 4ms across 3 boots with
> and without the revert, but this is just our particular case.
> 
> I don't fully understand the scan process here, but I can understand the
> challenges in trying to parallelize it and still end up with a consistent
> enumerated list.

Even without parallel disk scan on boot to ensure a consistent naming of drives
from some port or LUN order, any run-time event that cause a drive to "go away"
and come back (e.g. topology change event) can result in the drive name
changing. The order itself depends on the LLD code too. A driver change can
result in a different probe order, so in different names. Same if say you
create/delete LUNs on a RAID system: when doing it, you will get some drive
names, but after a reboot & scan, the LUNs may be presented with different
names. /dev/sdX names are simply not reliable. For consistent, reliable, drive
configurations, applications must use the /dev/disk/by-*/* IDs.

> 
> I guess you would agree that removing sd* entirely would not be an option
> because they've existed forever historically, but at the same time, the
> only time they really "work" now are as symlink targets for by-*, and in
> the case where only one disk exists at boot time. Do I have this right?
> 
> Simon-
> 


-- 
Damien Le Moal
Western Digital Research



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux