Re: Riad scrub generated errors, should I worry?

Phil Turmel <philip@xxxxxxxxxx> · Mon, 02 Mar 2015 16:09:02 -0500

On 03/02/2015 12:43 PM, Thomas Fjellstrom wrote:
> On Mon 02 Mar 2015 04:22:00 PM Mikael Abrahamsson wrote:
>> On Mon, 2 Mar 2015, Wilson, Jonathan wrote:
>>> While the monthly scrub was running the following errors (at the bottom
>>> of the post, copied from syslog) were issued.
>>
>> As soon as you get UNC, it's the drive reporting that it can't
>> successfully read a sector. Usually this sector is then reported as
>> "pending" in your SMART output.
>>
>> Since the log you provided shows a lot of sectors being corrected and you
>> after that have 0 pending sectors on the drive, I'd say you are now fine.
>> I would run a new scrub manually in a few days just to check, but you
>> might be fine going forward. There is no really good way to know, but
>> generally, a drive that throws a bunch of UNC should be monitored so this
>> isn't becoming a common problem. I tend to replace drives that have thrown
>> these kinds of errors if it happens on any kind of regular basis.
> 
> Dumb question, but after pending, I assume they go into the reallocated 
> column? I think after a certain number of those, you should start thinking 
> about a replacement. Like with my recent issues, I had two drives with a few 
> too many reallocated sectors. One was over 16k and the other was over 32k. 
> They still "work", but I replaced them with WD Reds anyhow. Another drive 
> seemed to max out the start-stop count field at 65536. Hah. No more cheap 
> desktop seagates in raid for this fellow.

If the URE was simply due to magnetic decay without actual damage, you
can expect MD to rewrite the sector and fix it.  No more pending, no
relocation.  If the spot on the media is truly failing, the rewrite and
recheck the drive does for pending sectors will expose the problem, and
the firmware will relocate.

Read errors like this are normal and expected.  The drive data shows
10k+ hours of operation, so the honeymoon (no errors at all) is over.
Scrub weekly or monthly so these UREs don't accumulate and carry on.
When actual *relocations* climb into double digits, replace the drive.

HTH,

Phil

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html