Re: Pending sectors in valid array - how to proceed?

Simon Matthews <simon.d.matthews@xxxxxxxxx> · Thu, 29 Jul 2010 21:24:52 -0700

On Wed, Jul 28, 2010 at 7:50 PM, Simon Matthews
<simon.d.matthews@xxxxxxxxx> wrote:

>
> I am waiting for this drive to get to the point that Seagate will accept an RMA:
>
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x000f   089   076   006    Pre-fail
> Always       -       173224741
>  3 Spin_Up_Time            0x0003   094   093   000    Pre-fail
> Always       -       0
>  4 Start_Stop_Count        0x0032   100   100   020    Old_age
> Always       -       69
>  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail
> Always       -       2002
>  7 Seek_Error_Rate         0x000f   046   036   030    Pre-fail
> Always       -       42786857552386
>  9 Power_On_Hours          0x0032   082   082   000    Old_age
> Always       -       16170
>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
> Always       -       5
>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age
> Always       -       69
> 184 Unknown_Attribute       0x0032   100   100   099    Old_age
> Always       -       0
> 187 Reported_Uncorrect      0x0032   012   012   000    Old_age
> Always       -       88
> 188 Unknown_Attribute       0x0032   100   090   000    Old_age
> Always       -       112
> 189 High_Fly_Writes         0x003a   100   100   000    Old_age
> Always       -       0
> 190 Airflow_Temperature_Cel 0x0022   064   057   045    Old_age
> Always       -       36 (Lifetime Min/Max 33/43)
> 194 Temperature_Celsius     0x0022   036   043   000    Old_age
> Always       -       36 (0 10 0 0)
> 195 Hardware_ECC_Recovered  0x001a   031   020   000    Old_age
> Always       -       173224741
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
> Always       -       0
>
>
> It is a desktop drive and is used for half of several  RAID1 arrays,
> but so far it hasn't been kicked out of any arrays. I have run a check
> several times in the last few days.  I had expected it to show a
> failing state when the reallocated sector count reached 2000, but it
> hasn't.

Well, despite the S.M.A.R.T. data showing that the drive is OK, it has
apparently totally failed this evening. The disk is totally
inaccessible

>From the logs:
Jul 29 20:39:56 server2 kernel: ata1: failed to read log page 10h (errno=-5)
Jul 29 20:40:48 server2 kernel: ata1.00: exception Emask 0x1 SAct
0x403ffff7 SErr 0x0 action 0x0
Jul 29 20:40:48 server2 kernel: ata1.00: irq_stat 0x40000008
Jul 29 20:40:48 server2 kernel: ata1.00: cmd
60/80:00:e8:2b:5a/00:00:47:00:00/40 tag 0 ncq 65536 in
Jul 29 20:40:48 server2 kernel:          res
40/00:a8:e8:35:5a/3a:00:47:00:00/40 Emask 0x1 (device error)
Jul 29 20:40:48 server2 kernel: ata1.00: status: { DRDY }
Jul 29 20:40:48 server2 kernel: ata1.00: cmd
60/80:08:e8:2f:5a/00:00:47:00:00/40 tag 1 ncq 65536 in
Jul 29 20:40:48 server2 kernel:          res
40/00:a8:e8:35:5a/00:00:47:00:00/40 Emask 0x1 (device error)
Jul 29 20:40:48 server2 kernel: ata1.00: status: { DRDY }
Jul 29 20:40:48 server2 kernel: ata1.00: cmd
60/80:10:e8:33:5a/00:00:47:00:00/40 tag 2 ncq 65536 in
Jul 29 20:40:48 server2 kernel:          res
40/00:a8:e8:35:5a/00:00:47:00:00/40 Emask 0x1 (device error)
...
Jul 29 20:40:48 server2 kernel: ata1.00: status: { DRDY }
Jul 29 20:40:48 server2 kernel: ata1.00: qc timeout (cmd 0xec)
Jul 29 20:40:48 server2 kernel: ata1.00: failed to IDENTIFY (I/O
error, err_mask=0x5)
Jul 29 20:40:48 server2 kernel: ata1.00: revalidation failed (errno=-5)
Jul 29 20:40:48 server2 kernel: ata1: hard resetting link
Jul 29 20:40:48 server2 kernel: ata1: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
Jul 29 20:40:48 server2 kernel: ata1.00: qc timeout (cmd 0xa1)
Jul 29 20:40:48 server2 kernel: ata1.00: failed to IDENTIFY (I/O
error, err_mask=0x5)
Jul 29 20:40:48 server2 kernel: ata1.00: revalidation failed (errno=-5)
Jul 29 20:40:48 server2 kernel: ata1: limiting SATA link speed to 1.5 Gbps
Jul 29 20:40:48 server2 kernel: ata1: hard resetting link
Jul 29 20:40:48 server2 kernel: ata1: SATA link up 1.5 Gbps (SStatus
113 SControl 310)
Jul 29 20:40:48 server2 kernel: ata1.00: qc timeout (cmd 0xa1)
Jul 29 20:40:48 server2 kernel: ata1.00: failed to IDENTIFY (I/O
error, err_mask=0x5)
Jul 29 20:40:48 server2 kernel: ata1.00: revalidation failed (errno=-5)
Jul 29 20:40:48 server2 kernel: ata1.00: disabled
Jul 29 20:40:48 server2 kernel: ata1: hard resetting link
Jul 29 20:40:48 server2 kernel: ata1: SATA link up 1.5 Gbps (SStatus
113 SControl 310)
Jul 29 20:40:48 server2 kernel: ata1: EH complete
Jul 29 20:40:48 server2 kernel: sd 0:0:0:0: [sda] Unhandled error code
Jul 29 20:40:48 server2 kernel: sd 0:0:0:0: [sda] Result:
hostbyte=0x04 driverbyte=0x00
Jul 29 20:40:48 server2 kernel: end_request: I/O error, dev sda,
sector 1197091688

I guess Seagate will accept it for an RMA now!

Simon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html