On 10/06/2013 06:11 PM, Michał Sawicz wrote: > On 06.10.2013 23:44, Phil Turmel wrote: >> The answer is*NO*. That is not expected. But it does happen with >> timeout mismatches, and the double failure you experienced is a common >> result of error correction timeout mismatch. Timeout mismatch is where >> your drives are internally trying to retry reading a bad sector long >> after the OS has given up. It is always associated with consumer-grade >> hard drives in raid arrays. > > Right, I knew that consumer HDDs did that, but didn't expect this to > cause such mayhem. So the take out for me for this is: as soon as you > see bad blocks on the drive, fail it, otherwise the whole array will > probably get kicked out sooner or later. Or try and manually force the > drive to reallocate, and then do a scrub. No, just fix the timeouts. Otherwise, you'll be kicking drives out *way* more often than you think. Do check your smartctl reports for actual relocations, though. In my experience, once you pass single digits, further failures are rapid. >> You might want to search the list archives for various combinations of >> "error recovery", "scterc", "URE" and "timeout mismatch" for a full >> description of the problem and the recommended ways to avoid it. > > Thanks, will do. Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html