Re: RAID1 scrub ignoring read errors?

Niklas Hambüchen <mail@xxxxxx> · Fri, 7 Dec 2018 07:51:21 +0100

Hey Brad,

Puzzling also as to why md didn't re-write that sector when it found a read error.

I am a bit confused, did you observe that somewhere in my output?

Yes, dmesg reported e.g.

  [2276887.492840] sd 0:0:0:0: [sda] tag#18 CDB: Read(16) 88 00 00 00 00 00 01 3f 78 00 00 00 06 80 00 00
  [2276887.492842] blk_update_request: I/O error, dev sda, sector 20936744

but while there's no output that suggests that md re-wrote that sector, there is also no output that suggests that it didn't.
This was one of my questions further up the thread:
Shoud(n't) md print something into dmesg when it does?

When I now `dd bs=512 if=/dev/sda of=/dev/null skip=20936744 count=1`, that completes without error.
That suggests to me that it *did* re-write the error, but didn't print anything.

As mentioned above, my suspicion is that md started to scrub, re-wrote some read errors (on both disks) by using the corresponding device, but then stopped the scrub at the first write error encountered.
Then `smartctl -t short` reported a read error simply because the scrub hadn't gotten to that point yet.
Does this theory make sense?

Further, my question remains whether I can tell md to continue past the write error and fix all read errors it can.

Also maybe WD Red drives report Reallocated_Sector_Ct always as 0, even if sectors were allocated?
Does anybody know a report of a WD Red drive where it isn't 0?

copy the data from the old RAID to the new RAID

That is an option; using --replace like Adam wrote further down the thread also sounds like it was designed for this.

Drives are cheap. Backups are cheap. Data recovery is expensive.

I'll have a backup no matter which approach to restoration I'll use.

Niklas