On 11/14/2016 11:03 AM, Bruce Merry wrote: > On 14 November 2016 at 17:58, Wols Lists <antlists@xxxxxxxxxxxxxxx> wrote: >> On 14/11/16 15:52, Bruce Merry wrote: >>> On 13 November 2016 at 23:06, Wols Lists <antlists@xxxxxxxxxxxxxxx> wrote: >>>>> Sounds like that drive could need replacing. I'd get a new drive and do >>>>> that as soon as possible - use the --replace option of mdadm - don't >>>>> fail the old drive and add the new. >>> Would you mind explaining why I should use --replace instead of taking >>> out the suspect drive? I guess I lose redundancy for any writes that >>> occur while the rebuild is happening, but I'd plan to do this with the >>> filesystem unmounted so there wouldn't be any writes. >> >> Because a replace will copy from the old drive to the new, recovering >> any failures from the rest of the array. A fail-and-add will have to >> rebuild the entire new array from what's left of the old, stressing the >> old array much more. I entirely endorse Anthony's advice on this one. You are at great risk of not completing a fail/add resync with the new drive. > Okay, I can see how for RAID5 that might be a bad thing. > > In my case however, it sounds like --replace will copy everything from > the failing drive, whereas I'd rather it copied everything from the > good drive. Same stress on the array, less chance of copying dodgy > data. You simply don't have that choice, sorry. And drives returning dodgy data is ungodly rare. The sector checksum algorithms are that good. You have a URE crisis in your array that is far more significant. Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html