Re: Improper Naming in /dev/disk/by-id and Drives Offline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 10, 2014 at 8:53 PM, Greg KH <greg@xxxxxxxxx> wrote:
> On Wed, Sep 10, 2014 at 08:34:06PM -0600, Brandon R Schwartz wrote:
>> Hi,
>>
>> I'm working on a particular issue (possibly two separate issues) where
>> our HDDs are (1) getting mislabeled in /dev/disk/by-id and (2)
>> dropping offline even though drive and controller logs show that the
>> drive is communicating and working as expected.  I don't have much
>> knowledge on the udev side of things so it would be great if someone
>> could offer some insight into the way udev assigns device names and if
>> there are thoughts as to why the OS cannot see the drive in certain
>> cases (timing issue?).
>>
>> The first issue, the mislabeling problem, is that on reboots or power
>> cycles we occasionally see our drives become mislabeled in
>> /dev/disk/by-id.  We expect to see something like:
>>
>> ata-ST3000DM001-1CH166_W1F26HKK
>> ata-ST3000DM001-1CH166_Z1F2FBBY
>>
>> But instead we see:
>>
>> ata-ST3000DM001-1CH166_W1F26HKK
>> scsi-35000c500668a9bdb
>>
>> The "scsi" drive is assigned a drive letter and the OS can communicate
>> with the drive.  Drives logs and controller logs show the drive is
>> working properly, but for some reason it's getting labeled incorrectly
>> in /dev/disk/by-id.  We have looked through dmesg and enabled logging
>> in udev (udevadm control --log-priority=debug), but we have not seen
>> where these labels are coming from.
>
> Sounds like blkid didn't read the uuid properly.  Is this happening in
> your initrd?  Is this a systemd init system, or something else?  What
> distro / version is this?  What kernel version is this?
>

Hi Greg,

The distro is RHEL 6.3 with kernel version 2.6.32.  We have also seen
the issue on a Debian based system with kernel  3.2.45.  We ran into
this issue again yesterday on RHEL and tested the command 'udevadm
trigger' and it repopulated /dev/disk/by-id with the correct
information.  Is there another level of debugging that we can enable
to see where the information might be getting read improperly?

>> The second issue is slightly related to the first in that it appears
>> during the same power cycle/reboot test.  We have noticed that on
>> occasion, our drives will not be detected by the OS (not listed in
>> /dev/disk/by-id) at all.  However, if we look at drive logs and
>> controller logs, we don't see any issue.  The controller is able to
>> see the drives and communicate with them, but the OS is unable to.
>> Any ideas as to why communication is not established?
>>
>> Also, is there a way to refresh the /dev/disk/by-id listing (udevadm
>> trigger?) once the OS has booted in order to rescan for attached
>> devices and repopulate it?  Thanks for any information and let me know
>> if you need logs or anything else.
>
> That depends on your distro, and how it's set up.  You could "coldplug"
> the by-id values by using udevadmn trigger, have you tried that?  You
> shouldn't have to do it, as it sounds like you have a boot time race
> condition somewhere...

What do you mean by 'coldplug' the by-id values with udevadm trigger?
This issue happens much more infrequently so we are still waiting for
a failure to test.  We are also looking into ways that we can
exacerbate the issue if it is a boot time race condition.

>
> thanks,
>
> greg k-h

Regards,
Brandon

-- 
Brandon Schwartz
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux DVB]     [Asterisk Internet PBX]     [DCCP]     [Netdev]     [X.org]     [Util Linux NG]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux