Re: jbod + SMART : how to identify failing disks ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello again,

So whatever magic allows the Dell MD1200 to report the slot position for
each disk isn't present in your JBODs. Time for something else.

There are two sides to your problem:

1) Identifying which disk is where in your JBOD

Quite easy. Again I'd go for a udev rule + script that will either
rename the disks entirely, or create a symlink with a name like
"jbodX-slotY" or something to figure out easily which is which. The
mapping end-device-to-slot can be static in the script, so you need to
identify once the order in which the kernel scans the slots and then you
can map.

But it won't survive a disk swap or a change of scanning order from a
kernel upgrade, so it's not enough.

2) Finding a way of identification independent of hot-plugs and scan order

That's the tricky part. If you remove a disk from your JBOD and replace
it with another one, the other one will get another "sdX" name, and in
my experience even another "end_device-..." name. But given that you
want the new disk to have the exact same name or symlink as the previous
one, you have to find something in the path of the device or (better) in
the udev attributes that is immutable.

If possible at all, it will depend on your specific hardware
combination, so you will have to try for yourself.

Suggested methodology:

1) write down the serial number of one drive in any slot, and figure out
its device name (sdX) with "smartctl -i /dev/sd..."

2) grab the detailed /sys path name and list of udev attributes:
readlink -f /sys/class/block/sdX
udevadm info --attribute-walk /dev/sdX

3) pull that disk and replace it. Check the logs to see which is its new
device name (sdY)

4) rerun the commands from #2 with sdY

5) compare the outputs and find something in the path or in the
attributes that didn't change and is unique to that disk (ie not a
common parent for example).

If you have something that really didn't change, you're in luck. Either
use the serial numbers or unplug and replug all disks one by one to
figure out the mapping slot number / immutable item.

Then write the udev rule. :)

Thanks!
JF



On 19/11/14 11:29, SCHAER Frederic wrote:
> Hi
> 
> Thanks.
> I hoped it would be it, but no ;)
> 
> With this mapping :
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdb -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:0/end_device-1:1:0/target1:0:1/1:0:1:0/block/sdb
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdc -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:1/end_device-1:1:1/target1:0:2/1:0:2:0/block/sdc
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdd -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:2/end_device-1:1:2/target1:0:3/1:0:3:0/block/sdd
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sde -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:3/end_device-1:1:3/target1:0:4/1:0:4:0/block/sde
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdf -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:4/end_device-1:1:4/target1:0:5/1:0:5:0/block/sdf
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdg -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:5/end_device-1:1:5/target1:0:6/1:0:6:0/block/sdg
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdh -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:6/end_device-1:1:6/target1:0:7/1:0:7:0/block/sdh
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdi -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:7/end_device-1:1:7/target1:0:8/1:0:8:0/block/sdi
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdj -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:0/end_device-1:2:0/target1:0:9/1:0:9:0/block/sdj
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdk -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:1/end_device-1:2:1/target1:0:10/1:0:10:0/block/sdk
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdl -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:2/end_device-1:2:2/target1:0:11/1:0:11:0/block/sdl
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdm -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:3/end_device-1:2:3/target1:0:12/1:0:12:0/block/sdm
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdn -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:4/end_device-1:2:4/target1:0:13/1:0:13:0/block/sdn
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdo -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:5/end_device-1:2:5/target1:0:14/1:0:14:0/block/sdo
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdp -> ../../devices/pci0000:00/0000:00:04.0/0000:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:6/end_device-1:2:6/target1:0:15/1:0:15:0/block/sdp
> 
> sdd was on physical slot 12, sdk was on slot 5, and sdg was on slot 9 (and I did not check the others)...
> so clearly this cannot be put in production as is and I'll have to find a way.
> 
> Regards
> 
> 
> -----Message d'origine-----
> De : Carl-Johan Schenström [mailto:carl-johan.schenstrom@xxxxx] 
> Envoyé : lundi 17 novembre 2014 14:14
> À : SCHAER Frederic; Scottix; Erik Logtenberg
> Cc : ceph-users@xxxxxxxxxxxxxx
> Objet : RE:  jbod + SMART : how to identify failing disks ?
> 
> Hi!
> 
> I'm fairly sure that the link targets in /sys/class/block were correct the last time I had to change a drive on a system with a Dell HBA connected to an MD1000, but perhaps I was just lucky. =/
> 
> I.e.,
> 
> # ls -l /sys/class/block/sdj
> lrwxrwxrwx. 1 root root 0 17 nov 13.54 /sys/class/block/sdj -> ../../devices/pci0000:20/0000:20:0a.0/0000:21:00.0/host7/port-7:0/expander-7:0/port-7:0:1/expander-7:2/port-7:2:6/end_device-7:2:6/target7:0:7/7:0:7:0/block/sdj
> 
> would be first port on HBA, first expander, 7th slot (6, starting from 0). Don't take my word for it, though!
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux