On Mon, Aug 15, 2016 at 8:42 AM, Tim Small <tim@xxxxxxxxxxxxxxxx> wrote: > On 15/08/16 14:57, Chris Murphy wrote: >> $ sudo smartctl -l scterc <dev> ## for each device used in the array >> $ sudo cat /sys/block/<dev>/device/timeout ## for each device used >> in the array > > These were all reporting: > > SCT Error Recovery Control: > Read: Disabled > Write: Disabled You failed to provide the value for the 2nd command. Is it something other than 30 for each device? > > However I'm not sure how this would cause a read error from the md > device itself? There are no timeout/reset messages in the kernel logs > for the underlying SATA devices? Nevertheless it's a misconfiguration that inhibits proper read error reporting by the drive, thereby preventing the md driver from fixing bad sectors via writing good data over them and causing the drive firmware to sort it out. So you should issue 'smartctl -l scterc,70,70 <dev>' for all devices and make sure this is made persistent at boot time. > > To check, I've set the ERC on all drives to 6.5 seconds for both reads > and writes, and restarted the "dd if=/dev/md2 of=/dev/null > conv=noerror", and it's just produced read failures at exactly the same > places, with no further kernel messages. Well it isn't really a read error, it's a buffer io error that happens to be triggered when reading, so it's a little more specific than a read error. It sounds to me you've run into a bug or there's some kind of hardware problem somewhere. It might be helpful if you provide the entire dmesg from boot until the first error message. As well as the stuff Andreas asked for. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html