Hi! I'm fairly sure that the link targets in /sys/class/block were correct the last time I had to change a drive on a system with a Dell HBA connected to an MD1000, but perhaps I was just lucky. =/ I.e., # ls -l /sys/class/block/sdj lrwxrwxrwx. 1 root root 0 17 nov 13.54 /sys/class/block/sdj -> ../../devices/pci0000:20/0000:20:0a.0/0000:21:00.0/host7/port-7:0/expander-7:0/port-7:0:1/expander-7:2/port-7:2:6/end_device-7:2:6/target7:0:7/7:0:7:0/block/sdj would be first port on HBA, first expander, 7th slot (6, starting from 0). Don't take my word for it, though! -- Carl-Johan Schenström Driftansvarig / System Administrator Språkbanken & Svensk nationell datatjänst / The Swedish Language Bank & Swedish National Data Service Göteborgs universitet / University of Gothenburg carl-johan.schenstrom@xxxxx / +46 709 116769 ________________________________________ From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of SCHAER Frederic <frederic.schaer@xxxxxx> Sent: Friday, November 14, 2014 17:24 To: Scottix; Erik Logtenberg Cc: ceph-users@xxxxxxxxxxxxxx Subject: Re: jbod + SMART : how to identify failing disks ? Hi, Thanks for your replies :] Indeed, I did not think about the /sys/class/leds, but unfortunately I have nothing in there on my systems. This is kernel related, so I presume it would be the module duty to expose leds there (in my case, mpt2sas) ... that would indeed be welcome ! /sys/block is not of great help neither, unfortunately. The last thing I haven't tried is to compile the Dell driver and try it instead of the kernel one - sigh - , or an elrepo kernel... [root@ceph0 ~]# cat /sys/block/sd*/../../../../sas_device/end_device-*/bay_identifier 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Maybe a kernel bug... Regards -----Message d'origine----- De : ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] De la part de Scottix Envoyé : mercredi 12 novembre 2014 18:43 À : Erik Logtenberg Cc : ceph-users@xxxxxxxxxxxxxx Objet : Re: jbod + SMART : how to identify failing disks ? I would say it depends on your system and where drives are connected to. Some HBA have a cli tool to manage the drives connected like a raid card would do. One other method I found is sometimes it will expose the leds for you http://fabiobaltieri.com/2011/09/21/linux-led-subsystem/ has an article on the /sys/class/led but not guarantee. On my laptop I could turn on lights and stuff but our server didn't have anything. Seems like a feature either linux or smartctrl should have. I have ran into this problem before but did a couple tricks to figure it out. I guess best solution is just to track the drives S/N. Maybe a good note to have in the doc for a Ceph cluster to be aware of. On Wed, Nov 12, 2014 at 9:06 AM, Erik Logtenberg <erik@xxxxxxxxxxxxx> wrote: > I have no experience with the DELL SAS controller, but usually the > advantage of using a simple controller (instead of a RAID card) is that > you can use full SMART directly. > > $ sudo smartctl -a /dev/sda > > === START OF INFORMATION SECTION === > Device Model: INTEL SSDSA2BW300G3H > Serial Number: PEPR2381003E300EGN > > Personally, I make sure that I know which serial number drive is in > which bay, so I can easily tell which drive I'm talking about. > > So you can use SMART both to notice (pre)failing disks -and- to > physically identify them. > > The same smartctl command also returns the health status like so: > > 233 Media_Wearout_Indicator 0x0032 099 099 000 Old_age Always > - 0 > > This specific SSD has 99% media lifetime left, so it's in the green. But > it will continue to gradually degrade, and at some time It'll hit a > percentage where I like to replace it. To keep an eye on the speed of > decay, I'm graphing those SMART values in Cacti. That way I can somewhat > predict how long a disk will last, especially SSD's which die very > gradually. > > Erik. > > > On 12-11-14 14:43, JF Le Fillâtre wrote: >> >> Hi, >> >> May or may not work depending on your JBOD and the way it's identified >> and set up by the LSI card and the kernel: >> >> cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier >> >> The weird path and the wildcards are due to the way the sysfs is set up. >> >> That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running >> CentOS release 6.5. >> >> Note that you can make your life easier by writing an udev script that >> will create a symlink with a sane identifier for each of your external >> disks. If you match along the lines of >> >> KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*" >> >> then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier" >> in a script (with $1 being the $id of udev after that match, so the >> string "end_device-X:Y:Z") to obtain the bay ID. >> >> Thanks, >> JF >> >> >> >> On 12/11/14 14:05, SCHAER Frederic wrote: >>> Hi, >>> >>> >>> >>> I’m used to RAID software giving me the failing disks slots, and most >>> often blinking the disks on the disk bays. >>> >>> I recently installed a DELL “6GB HBA SAS” JBOD card, said to be an LSI >>> 2008 one, and I now have to identify 3 pre-failed disks (so says >>> S.M.A.R.T) . >>> >>> >>> >>> Since this is an LSI, I thought I’d use MegaCli to identify the disks >>> slot, but MegaCli does not see the HBA card. >>> >>> Then I found the LSI “sas2ircu” utility, but again, this one fails at >>> giving me the disk slots (it finds the disks, serials and others, but >>> slot is always 0) >>> >>> Because of this, I’m going to head over to the disk bay and unplug the >>> disk which I think corresponds to the alphabetical order in linux, and >>> see if it’s the correct one…. But even if this is correct this time, it >>> might not be next time. >>> >>> >>> >>> But this makes me wonder : how do you guys, Ceph users, manage your >>> disks if you really have JBOD servers ? >>> >>> I can’t imagine having to guess slots that each time, and I can’t >>> imagine neither creating serial number stickers for every single disk I >>> could have to manage … >>> >>> Is there any specific advice reguarding JBOD cards people should (not) >>> use in their systems ? >>> >>> Any magical way to “blink” a drive in linux ? >>> >>> >>> >>> Thanks && regards >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Follow Me: @Taijutsun Scottix@xxxxxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com