Re: RAID1 scrub ignoring read errors?

Phil Turmel <philip@xxxxxxxxxx> · Mon, 3 Dec 2018 01:35:33 -0500

On 12/2/18 6:32 PM, Wol's lists wrote:
> On 02/12/2018 22:00, Niklas Hambüchen wrote:
>> This makes sense.
>>
>> But does it apply here, given the flood of read errors in my dmesg in
>> just a single scrub?
>> The probability for that many errors for a single pass over 3 GB seems
>> very low.

True enough.  And your WD reds are not a timeout hazards.  Is there any
chance you are having power supply problems?  The last time I saw
something like this, it was a failing 2nd 12V rail on a dual-rail PS.

Look close at "pending sector relocations" the detailed smartctl
attributes.  Large numbers suggest your corrections aren't happening
when they should, and/or lots of transient UREs are occurring.

> I just thought. Is your scrub a "check" or "repair"? I don't think a
> check actually rewrites, so failures can accumulate. If a sector becomes
> unreadable because the magnetism has faded, it will fail repeatedly
> until it's rewritten. And without a "repair" it won't necessarily be
> rewritten.

Yes, "check" does rewrite UREs.  "repair" also rewrites all other parity
and Q syndrome or mirrors.

> The other thing is, mdadm 3.3 ... I'd upgrade that if I were you. It's
> got known bugs including problems with mirrors ... I don't think there's
> any problems with scrubs, though, so it's a "general principles" advice
> to upgrade, not a "you need to".

There are assembly bugs floating around in that general span of
versions, so an upgrade is a good idea.

Phil