On 12/2/18 1:47 PM, Niklas Hambüchen wrote: > The answer may simply that as described on > > https://raid.wiki.kernel.org/index.php/Scrubbing_the_drives > > " > If a read error is encountered, the block in error is calculated and written back. > If the array is a mirror, as it can't calculate the correct data, > it will take the data from the first (available) drive and write it back to the dodgy drive. > Drives are designed to handle this - if necessary the disk sector will be re-allocated and moved. > " > > However, I am unsure whether this is the case here, as I would expect > > * dmesg to mention that md took this action > * mdadm to send me an email of this event, as it is an indication that a disk needs replacement > Disks don't need replacing on occassional read errors, because they are normal. Typical consumer-grade hard drives quote a unrecoverable read error rate of under 1x10^-14. That works out to, on average, one URE every 12.5 TB read. On large drives and large arrays of drives, that's just a few reads from end to end. New drives are much better than that, of course, but still will have some UREs occassionally. The kernel's MD layer tolerates a burst of up to 20 UREs on a single device, and up to 10 per hour otherwise. When such are encountered, MD reconstructs (parity raid) or copies (mirrors) the missing data back to that sector to give the drive firmware a chance to fix it (overwrite and verify, possibly with relocate). As is expected. (This latter process can then produce a write error, if the original error is really a communications problem.) Some reading from the archives to help you understand this (and warn you of your possible problem with timeout mismatch): https://marc.info/?l=linux-raid&m=135863964624202&w=2 https://marc.info/?l=linux-raid&m=135811522817345&w=1 https://marc.info/?l=linux-raid&m=139050322510249&w=2 https://marc.info/?l=linux-raid&m=133761065622164&w=2 https://marc.info/?l=linux-raid&m=133665797115876&w=2 Hope this helps, Phil