Re: read errors with md RAID5 array

Tim Small <tim@xxxxxxxxxxxxxxxx> · Mon, 15 Aug 2016 15:42:38 +0100

On 15/08/16 14:57, Chris Murphy wrote:
> $ sudo smartctl -l scterc <dev>   ## for each device used in the array
> $ sudo cat /sys/block/<dev>/device/timeout   ## for each device used
> in the array

These were all reporting:

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

However I'm not sure how this would cause a read error from the md
device itself?  There are no timeout/reset messages in the kernel logs
for the underlying SATA devices?

To check, I've set the ERC on all drives to 6.5 seconds for both reads
and writes, and restarted the "dd if=/dev/md2 of=/dev/null
conv=noerror", and it's just produced read failures at exactly the same
places, with no further kernel messages.

Some scenarios:

1. These are write-hole locations, and the md driver has recorded this
and is failing I/O here (didn't know it did this, and a quick read
through the raid5 code couldn't see this, BICBW as I was just skimming it).

2. Two underlying drives have I/O problems at these locations (but then
why no errors in kernel logs?).

3. Something's bad in the block or ATA layer.

... or something else.

Cheers,

Tim.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html