Re: RAID1 scrub ignoring read errors?

Niklas Hambüchen <mail@xxxxxx> · Thu, 6 Dec 2018 15:33:08 +0100

On 2018-12-04 01:27, Brad Campbell wrote:
> Try running a read on the disk with :
> dd if=/dev/sdX of=/dev/null bs=1M conv=noerror

Hey Brad, thanks for your reply!

I first tried reading only around the first problematic sector 1758544.
First the one directly before it:

  # dd bs=512 if=/dev/sdb of=/dev/null skip=1758543 count=1
  1+0 records in
  1+0 records out
  512 bytes copied, 0,00713634 s, 71,7 kB/s

Now the problematic sector:

  # dd bs=512 if=/dev/sdb of=/dev/null skip=1758544 count=1
  dd: error reading '/dev/sdb': Input/output error
  0+0 records in
  0+0 records out
  0 bytes copied, 7,00467 s, 0,0 kB/s

Error after 7 seconds, seems like timeouts are working as expected.
After I did so, I got in smartctl:

  ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  ...
  197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1

So that seems to work as expected.
Why did it not increase when the RAID1 scrub had the read failures though?

I am now running the dd you suggested on the whole disk, which will take a couple hours.

Recovery:

Also I'd like to ask what my recovery strategy should be.
My current understanding is that some sectors are unreadable on sda and some unreadable on sdb.
As per explanations so far, these can be fixed by re-writing from the corresponding other devices.
Now, sda seems to be truly broken, given that the RAID scrub reported that the write failed.
This means that if I replace sda by a new disk first, I will not be able to recover unreadable sectors on sdb (via copies from sda, because it'd be gone).

Ideally I would be able to first fix all unreadable sectors on sdb by copying the relevant sectors from sda.
But I don't know if that's possible, because it seems the scrub stops at the first write error to sdb.

What should I do?
Should I get a third disk and turn the md into a triple-RAID1 (would this continue the scrub/sync even if sdb has a write failure)?
Or is there a way I can tell the scrub to continue past write errors and fix as many read errors on as many devices as possible?

Thanks,
Niklas