Re: Help with data recovery - RAID6 with 2 failed drives and another with broken sectors

Phil Turmel <philip@xxxxxxxxxx> · Sun, 06 Oct 2013 18:15:22 -0400

On 10/06/2013 06:11 PM, Michał Sawicz wrote:
> On 06.10.2013 23:44, Phil Turmel wrote:
>> The answer is*NO*.  That is not expected.  But it does happen with
>> timeout mismatches, and the double failure you experienced is a common
>> result of error correction timeout mismatch.  Timeout mismatch is where
>> your drives are internally trying to retry reading a bad sector long
>> after the OS has given up.  It is always associated with consumer-grade
>> hard drives in raid arrays.
> 
> Right, I knew that consumer HDDs did that, but didn't expect this to
> cause such mayhem. So the take out for me for this is: as soon as you
> see bad blocks on the drive, fail it, otherwise the whole array will
> probably get kicked out sooner or later. Or try and manually force the
> drive to reallocate, and then do a scrub.

No, just fix the timeouts.  Otherwise, you'll be kicking drives out
*way* more often than you think.

Do check your smartctl reports for actual relocations, though.  In my
experience, once you pass single digits, further failures are rapid.

>> You might want to search the list archives for various combinations of
>> "error recovery", "scterc", "URE" and "timeout mismatch" for a full
>> description of the problem and the recommended ways to avoid it.
> 
> Thanks, will do.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html