On Wed, Sep 10, 2014 at 08:34:06PM -0600, Brandon R Schwartz wrote:
> Hi,
> I'm working on a particular issue (possibly two separate issues) where
> our HDDs are (1) getting mislabeled in /dev/disk/by-id and (2)
> dropping offline even though drive and controller logs show that the
> drive is communicating and working as expected.  I don't have much
> knowledge on the udev side of things so it would be great if someone
> could offer some insight into the way udev assigns device names and if
> there are thoughts as to why the OS cannot see the drive in certain
> cases (timing issue?).
> The first issue, the mislabeling problem, is that on reboots or power
> cycles we occasionally see our drives become mislabeled in
> /dev/disk/by-id.  We expect to see something like:
> ata-ST3000DM001-1CH166_W1F26HKK
> ata-ST3000DM001-1CH166_Z1F2FBBY
> But instead we see:
> ata-ST3000DM001-1CH166_W1F26HKK
> scsi-35000c500668a9bdb
> The "scsi" drive is assigned a drive letter and the OS can communicate
> with the drive.  Drives logs and controller logs show the drive is
> working properly, but for some reason it's getting labeled incorrectly
> in /dev/disk/by-id.  We have looked through dmesg and enabled logging
> in udev (udevadm control --log-priority=debug), but we have not seen
> where these labels are coming from.

Sounds like blkid didn't read the uuid properly.  Is this happening in
your initrd?  Is this a systemd init system, or something else?  What
distro / version is this?  What kernel version is this?

> The second issue is slightly related to the first in that it appears
> during the same power cycle/reboot test.  We have noticed that on
> occasion, our drives will not be detected by the OS (not listed in
> /dev/disk/by-id) at all.  However, if we look at drive logs and
> controller logs, we don't see any issue.  The controller is able to
> see the drives and communicate with them, but the OS is unable to.
> Any ideas as to why communication is not established?
> Also, is there a way to refresh the /dev/disk/by-id listing (udevadm
> trigger?) once the OS has booted in order to rescan for attached
> devices and repopulate it?  Thanks for any information and let me know
> if you need logs or anything else.

That depends on your distro, and how it's set up.  You could "coldplug"
the by-id values by using udevadmn trigger, have you tried that?  You
shouldn't have to do it, as it sounds like you have a boot time race
condition somewhere...


greg k-h
