On Fri, Oct 28, 2016 at 01:22:31PM +0100, Alexander Shenkin wrote: > One remaining question: is sdc definitely toast? In my opinion a drive is toast starting from the very first reallocated/ pending/uncorrectable sector, your drive has several of those and that's only the ones the drive already knows about - there may be more. > Or, is it possible that the Timeout Mismatch (as mentioned by Robin Hill; > thanks Robin) is flagging the drive as failed, when something else is at > play and perhaps the drive is actually fine? I don't believe in timeout mismatches, either. The timeouts are generous. Waiting for a disk to wake from standby is not a problem, and that takes ages already. If a disk gets stuck even longer in error correction limbo and it gets kicked because of it - IMHO that's the right call. A disk that is unable to read its data, a disk that refuses to write data, a disk that needs help from the RAID layer to correct its errors, should be kicked because it's not able to pull its own weight. You need drives that work without errors, without outside help, because during a rebuild, when the RAID is already degraded, there won't be any outside help. Either the disks work or your RAID is dead. RAID redundancy is supposed to allow disks be replaced. (mdadm --replace) If you use it instead to keep fixing errors on other disks, there is not any real redundancy left. In a RAID, if one of your disks has errors, you get rid of it as soon as possible. Your RAID did not fail because of timeouts or not. It's not important. It failed because you didn't notice broken disks in time and you had two. Testing, monitoring, actually acting on the first error, is important. People have different opinions on this. Someone might argue. It's up to you what risks to take. Regards Andreas Klauer -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html