Re: RAID1 scrub ignoring read errors?

Phil Turmel <philip@xxxxxxxxxx> · Sun, 2 Dec 2018 14:19:06 -0500

On 12/2/18 1:47 PM, Niklas Hambüchen wrote:
> The answer may simply that as described on
> 
>   https://raid.wiki.kernel.org/index.php/Scrubbing_the_drives
> 
> "
> If a read error is encountered, the block in error is calculated and written back.
> If the array is a mirror, as it can't calculate the correct data,
> it will take the data from the first (available) drive and write it back to the dodgy drive.
> Drives are designed to handle this - if necessary the disk sector will be re-allocated and moved.
> "
> 
> However, I am unsure whether this is the case here, as I would expect
> 
> * dmesg to mention that md took this action
> * mdadm to send me an email of this event, as it is an indication that a disk needs replacement
> 

Disks don't need replacing on occassional read errors, because they are
normal.  Typical consumer-grade hard drives quote a unrecoverable read
error rate of under 1x10^-14.  That works out to, on average, one URE
every 12.5 TB read.  On large drives and large arrays of drives, that's
just a few reads from end to end.  New drives are much better than that,
of course, but still will have some UREs occassionally.

The kernel's MD layer tolerates a burst of up to 20 UREs on a single
device, and up to 10 per hour otherwise.   When such are encountered, MD
reconstructs (parity raid) or copies (mirrors) the missing data back to
that sector to give the drive firmware a chance to fix it (overwrite and
verify, possibly with relocate).  As is expected.  (This latter process
can then produce a write error, if the original error is really a
communications problem.)

Some reading from the archives to help you understand this (and warn you
of your possible problem with timeout mismatch):

https://marc.info/?l=linux-raid&m=135863964624202&w=2
https://marc.info/?l=linux-raid&m=135811522817345&w=1
https://marc.info/?l=linux-raid&m=139050322510249&w=2
https://marc.info/?l=linux-raid&m=133761065622164&w=2
https://marc.info/?l=linux-raid&m=133665797115876&w=2

Hope this helps,

Phil