Hey Brad,
Puzzling also as to why md didn't re-write that sector when it found a read error.
I am a bit confused, did you observe that somewhere in my output? Yes, dmesg reported e.g. [2276887.492840] sd 0:0:0:0: [sda] tag#18 CDB: Read(16) 88 00 00 00 00 00 01 3f 78 00 00 00 06 80 00 00 [2276887.492842] blk_update_request: I/O error, dev sda, sector 20936744 but while there's no output that suggests that md re-wrote that sector, there is also no output that suggests that it didn't. This was one of my questions further up the thread: Shoud(n't) md print something into dmesg when it does? When I now `dd bs=512 if=/dev/sda of=/dev/null skip=20936744 count=1`, that completes without error. That suggests to me that it *did* re-write the error, but didn't print anything. As mentioned above, my suspicion is that md started to scrub, re-wrote some read errors (on both disks) by using the corresponding device, but then stopped the scrub at the first write error encountered. Then `smartctl -t short` reported a read error simply because the scrub hadn't gotten to that point yet. Does this theory make sense? Further, my question remains whether I can tell md to continue past the write error and fix all read errors it can. Also maybe WD Red drives report Reallocated_Sector_Ct always as 0, even if sectors were allocated? Does anybody know a report of a WD Red drive where it isn't 0?
copy the data from the old RAID to the new RAID
That is an option; using --replace like Adam wrote further down the thread also sounds like it was designed for this.
Drives are cheap. Backups are cheap. Data recovery is expensive.
I'll have a backup no matter which approach to restoration I'll use. Niklas