Re: jbod + SMART : how to identify failing disks ?

Scottix <scottix@xxxxxxxxx> · Wed, 12 Nov 2014 09:43:01 -0800

I would say it depends on your system and where drives are connected
to. Some HBA have a cli tool to manage the drives connected like a
raid card would do.
One other method I found is sometimes it will expose the leds for you
http://fabiobaltieri.com/2011/09/21/linux-led-subsystem/ has an
article on the /sys/class/led but not guarantee.

On my laptop I could turn on lights and stuff but our server didn't
have anything. Seems like a feature either linux or smartctrl should
have. I have ran into this problem before but did a couple tricks to
figure it out.

I guess best solution is just to track the drives S/N. Maybe a good
note to have in the doc for a Ceph cluster to be aware of.

On Wed, Nov 12, 2014 at 9:06 AM, Erik Logtenberg <erik@xxxxxxxxxxxxx> wrote:
> I have no experience with the DELL SAS controller, but usually the
> advantage of using a simple controller (instead of a RAID card) is that
> you can use full SMART directly.
>
> $ sudo smartctl -a /dev/sda
>
> === START OF INFORMATION SECTION ===
> Device Model:     INTEL SSDSA2BW300G3H
> Serial Number:    PEPR2381003E300EGN
>
> Personally, I make sure that I know which serial number drive is in
> which bay, so I can easily tell which drive I'm talking about.
>
> So you can use SMART both to notice (pre)failing disks -and- to
> physically identify them.
>
> The same smartctl command also returns the health status like so:
>
> 233 Media_Wearout_Indicator 0x0032   099   099   000    Old_age   Always
>       -       0
>
> This specific SSD has 99% media lifetime left, so it's in the green. But
> it will continue to gradually degrade, and at some time It'll hit a
> percentage where I like to replace it. To keep an eye on the speed of
> decay, I'm graphing those SMART values in Cacti. That way I can somewhat
> predict how long a disk will last, especially SSD's which die very
> gradually.
>
> Erik.
>
>
> On 12-11-14 14:43, JF Le Fillâtre wrote:
>>
>> Hi,
>>
>> May or may not work depending on your JBOD and the way it's identified
>> and set up by the LSI card and the kernel:
>>
>> cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier
>>
>> The weird path and the wildcards are due to the way the sysfs is set up.
>>
>> That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running
>> CentOS release 6.5.
>>
>> Note that you can make your life easier by writing an udev script that
>> will create a symlink with a sane identifier for each of your external
>> disks. If you match along the lines of
>>
>> KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*"
>>
>> then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier"
>> in a script (with $1 being the $id of udev after that match, so the
>> string "end_device-X:Y:Z") to obtain the bay ID.
>>
>> Thanks,
>> JF
>>
>>
>>
>> On 12/11/14 14:05, SCHAER Frederic wrote:
>>> Hi,
>>>
>>>
>>>
>>> I’m used to RAID software giving me the failing disks  slots, and most
>>> often blinking the disks on the disk bays.
>>>
>>> I recently installed a  DELL “6GB HBA SAS” JBOD card, said to be an LSI
>>> 2008 one, and I now have to identify 3 pre-failed disks (so says
>>> S.M.A.R.T) .
>>>
>>>
>>>
>>> Since this is an LSI, I thought I’d use MegaCli to identify the disks
>>> slot, but MegaCli does not see the HBA card.
>>>
>>> Then I found the LSI “sas2ircu” utility, but again, this one fails at
>>> giving me the disk slots (it finds the disks, serials and others, but
>>> slot is always 0)
>>>
>>> Because of this, I’m going to head over to the disk bay and unplug the
>>> disk which I think corresponds to the alphabetical order in linux, and
>>> see if it’s the correct one…. But even if this is correct this time, it
>>> might not be next time.
>>>
>>>
>>>
>>> But this makes me wonder : how do you guys, Ceph users, manage your
>>> disks if you really have JBOD servers ?
>>>
>>> I can’t imagine having to guess slots that each time, and I can’t
>>> imagine neither creating serial number stickers for every single disk I
>>> could have to manage …
>>>
>>> Is there any specific advice reguarding JBOD cards people should (not)
>>> use in their systems ?
>>>
>>> Any magical way to “blink” a drive in linux ?
>>>
>>>
>>>
>>> Thanks && regards
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Follow Me: @Taijutsun
Scottix@xxxxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com