Re: FailSpare event?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 12 Jan 2007, Neil Brown might have said:

> On Thursday January 11, mikee@xxxxxxxxxxxx wrote:
> > 
> > So I'm ok for the moment? Yes, I need to find the error and fix everything
> > back to the (S) state.
> 
> Yes, OK for the moment.
> 
> > 
> > The messages in $HOST:/var/log/messages for the time of the email are:
> > 
> > Jan 11 16:04:25 elo kernel: sd 2:0:4:0: SCSI error: return code = 0x8000002
> > Jan 11 16:04:25 elo kernel: sde: Current: sense key: Hardware Error
> > Jan 11 16:04:25 elo kernel:     Additional sense: Internal target failure
> > Jan 11 16:04:25 elo kernel: Info fld=0x10b93c4d
> > Jan 11 16:04:25 elo kernel: end_request: I/O error, dev sde, sector 280575053
> > Jan 11 16:04:25 elo kernel: raid5: Disk failure on sde2, disabling device. Operation continuing on 5 devices
> 
> Given the sector number it looks likely that it was a superblock
> update.
> No idea how bad an 'internal target failure' is.  Maybe powercycling
> the drive would 'fix' it, maybe not.
> 
> > 
> > On AIX boxes I can blink the drives to identify a bad/failing device. Is there
> > a way to blink the drives in linux?
> 
> Unfortunately not.
> 
> NeilBrown
> 

I found the smartctl command. I have a 'long' test running in the background.
I checked this drive and the other drives. This drive has been used the least
(confirms it is a spare?) and is the only one with 'Total uncorrected errors' > 0.

How to determine the error, correct the error, or clear the error?

Mike

[root@$HOST ~]# smartctl -a /dev/sde
smartctl version 5.36 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: SEAGATE  ST3146707LC      Version: D703
Serial number: 3KS30WY8
Device type: disk
Transport protocol: Parallel SCSI (SPI-4)
Local Time is: Thu Jan 11 17:00:26 2007 CST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     48 C
Drive Trip Temperature:        68 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 66108
  Blocks received from initiator = 147374656
  Blocks read from cache and sent to initiator = 42215
  Number of read and write commands whose size <= segment size = 12635583
  Number of read and write commands whose size > segment size = 0
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 3943.42
  number of minutes until next internal SMART test = 94

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:        354        0         0       354        354          0.546           0
write:         0        0         0         0          0        185.871           1

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed, segment failed   -    3943                 - [-   -    -]

Long (extended) Self Test duration: 2726 seconds [45.4 minutes]

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux