On 06/07/2016 09:39 PM, Brad Campbell wrote: > On 07/06/16 21:04, Phil Turmel wrote: >> On 06/07/2016 12:51 AM, Marc MERLIN wrote: > >>> Right, I understand now, good to know. >>> So I'll use badblocks next time I have this issue. >> >> Or just ignore them. You aren't using them, so they can't hurt you. > > That's actually not necessarily true. > > If you have a dud sector early on the disk (so before the start of the > RAID data) you will terminate every SMART long test in the first couple > of meg of the disk. So while a dud down there won't necessarily impact > your usage from a RAID perspective, it'll knacker your ability to > regularly check the disks in their entirety. SMART tests abort on the > first bad read. Don't bother doing long self-tests on drives participating in an array -- check scrubs do everything a long self-test does on the area of interest, plus actually fixing UREs that are found. And check scrubs don't abort on a read failure. My advice stands: ignore the UREs in unused areas of the disk. > It's ugly, but in the single instance I had that happen, I removed the > drive from the array, wrote zero to the entire disk and then added it > back. That forced a reallocation in the affected area. Completely pointless exercise that opened a window of higher-risk of failure of your array. Unless you used --replace with another spare to maintain redundancy on your array while that disk was out. > Usually if it is in the RAID zone, a check scrub will clear it up. > Having said that I've had a very peculiar one here in the last couple of > days. > > A WD 2TB Green drive with TLER set to 7 seconds. The first read would > error out in 7 seconds (as it should), but a second read succeeded. > After returning the error, the drive must have kept trying to recover in > the background and eventually succeeded and cached the result. So > subsequent reads were ok. After reading and writing enough to other > parts of the drive to flush the drives cache, the process would repeat. Pure speculation. Unless you can show better evidence that those drives will cache a read in that case, I would say it was just a mild enough weak spot that it randomly succeeded more than not. And if you follow my advice, it doesn't matter: if the array is the only process reading from the disk, the first appearance of the URE would be the last, as the array would re-write it immediately. Whether during a scrub or due to normal access. Regular long self-tests are highly recommended for stand-alone disks and for array hot spares. Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html