Re: read errors corrected

James <jtp@xxxxxxxxx> · Thu, 30 Dec 2010 11:41:18 -0500

Inline.

On Thu, Dec 30, 2010 at 05:13, Giovanni Tessore <giotex@xxxxxxxxxx> wrote:
> On 12/30/2010 04:20 AM, James wrote:
>>
>> Can someone point me in the right direction?
>> (a) what causes these errors precisely?
>> (b) is the error benign? How can I determine if it is *likely* a
>> hardware problem? (I imagine it's probably impossible to tell if it's
>> HW until it's too late)
>> (c) are these errors expected in a RAID array that is heavily used?
>> (d) what kind of errors should I see regarding "read errors" that
>> *would* indicate an imminent hardware failure?
>
> (a) these errors usually come from defective disk sectors. raid recostructs
> the missing sector from parity from other disks in the array, then rewrites
> the sector on the defective disk; if the sector is rewritten without error
> (maybe the hd remaps the sector into its reserved area), then just the log
> messages is displayed.
>
> (b) with raid-6 it's almost benign; to get troubles you should get a read
> error on same sector for >2 disks; or have 2 disks failed and out of the
> array and get a read error on one of the other disks while recostructing the
> array; or have 1 disk failed and get a read error on same sector on >1 disk
> while recostructing (with raid-5 it's almost dangerous instead, as you can
> have big troubles if a disk fails and you get a read error on another disk
> while recostructing; that happened to me!)
>
> (c) no; it's also a good rule to perform a periodic scrub of the array
> (check of the array), to reveal and correct defective sectors
>
> (d) check smart status of the disks, for "relocated sectors count"; also if
> md superblock is >= 1 there is a persistent count of corrected read errors
> for each device into /sys/block/mdXX/md/dev-XX/errors, when this counter
> reaches 256 the disk is marked failed; ihmo when a disk is giving even few
> corrected read errors in a short interval its better to replace it.

Good call.

Here's the output of the reallocated sector count:

~ # for i in a b c d ; do smartctl -a /dev/sd$i | grep Realloc ; done
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail
Always       -       1
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail
Always       -       3
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail
Always       -       5
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail
Always       -       1

Are these values high? Low? Acceptable?

How about values like "Raw_Read_Error_Rate" and "Seek_Error_Rate" -- I
believe I've read those are values that are normally very high...is
this true?

~ # for i in a b c d ; do smartctl -a /dev/sd$i | grep
Raw_Read_Error_Rate ; done
  1 Raw_Read_Error_Rate     0x000f   116   099   006    Pre-fail
Always       -       106523474
  1 Raw_Read_Error_Rate     0x000f   114   099   006    Pre-fail
Always       -       77952706
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail
Always       -       137525325
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail
Always       -       179042738

...and...

 ~ # for i in a b c d ; do smartctl -a /dev/sd$i | grep
Seek_Error_Rate ; done
  7 Seek_Error_Rate         0x000f   072   060   030    Pre-fail
Always       -       14923821
  7 Seek_Error_Rate         0x000f   072   060   030    Pre-fail
Always       -       15648709
  7 Seek_Error_Rate         0x000f   072   060   030    Pre-fail
Always       -       15733727
  7 Seek_Error_Rate         0x000f   071   060   030    Pre-fail
Always       -       14279452

Thoughts appreciated.

> --
> Yours faithfully.
>
> Giovanni Tessore
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html