Re: Fault tolerance with badblocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8 May 2017, Phil Turmel verbalised:

> On 05/08/2017 10:50 AM, Nix wrote:
>
>> I wonder... scrubbing is not very useful with md, particularly with RAID
>> 6, because it does no writes unless something mismatches,
>
> This is wrong.  The purpose of scrubbing is to expose any sectors that
> have degraded (as Wol describes) to the point of generating a read
> error.  A "check" scrub only writes back to the sectors that report a
> URE, giving the drive firmware a chance to fix or relocate the sector.
>
> A check scrub will NOT write on mismatch, just increment the mismatch
> counter.  This is the recommended regular scrubbing operation.  You want
> to know when mismatches occur.

And... then what do you do? On RAID-6, it appears the answer is "live
with a high probability of inevitable corruption". That's not very good.
(AIUI, if a check scrub finds a URE, it'll rewrite it, and when in the
common case the drive spares it out and the write succeeds, this will
not be reported as a mismatch: is this right?)

>> If there was a way to get md to *rewrite* everything during scrub,
>> rather than just checking, this might help (in addition to letting the
>> drive refresh the magnetization of absolutely everything).
>
> This is actually counterproductive.  Rewriting everything may refresh
> the magnetism on weakening sectors, but will also prevent the drive from
> *finding* weakening sectors that really do need relocation.

If a sector weakens purely because of neighbouring writes or temperature
or a vibrating housing or something (i.e. not because of actual damage),
so that a rewrite will strengthen it and relocation was never necessary,
surely you've just saved a pointless bit of sector sparing? (I don't
know: I'm not sure what the relative frequency of these things is. Read
and write errors in general are so rare that it's quite possible I'm
worrying about nothing at all. I do know I forgot to scrub my old
hardware RAID array for about three years and nothing bad happened...)

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux