Re: Raid check didn't fix Current_Pending_Sector, but badblocks -nsv did

Brad Campbell <lists2009@xxxxxxxxxxxxxxx> · Wed, 8 Jun 2016 09:39:45 +0800

On 07/06/16 21:04, Phil Turmel wrote:
On 06/07/2016 12:51 AM, Marc MERLIN wrote:

Right, I understand now, good to know.
So I'll use badblocks next time I have this issue.

Or just ignore them.  You aren't using them, so they can't hurt you.

That's actually not necessarily true.

If you have a dud sector early on the disk (so before the start of the 
RAID data) you will terminate every SMART long test in the first couple 
of meg of the disk. So while a dud down there won't necessarily impact 
your usage from a RAID perspective, it'll knacker your ability to 
regularly check the disks in their entirety. SMART tests abort on the 
first bad read.

It's ugly, but in the single instance I had that happen, I removed the 
drive from the array, wrote zero to the entire disk and then added it 
back. That forced a reallocation in the affected area.

Usually if it is in the RAID zone, a check scrub will clear it up. 
Having said that I've had a very peculiar one here in the last couple of 
days.

A WD 2TB Green drive with TLER set to 7 seconds. The first read would 
error out in 7 seconds (as it should), but a second read succeeded. 
After returning the error, the drive must have kept trying to recover in 
the background and eventually succeeded and cached the result. So 
subsequent reads were ok. After reading and writing enough to other 
parts of the drive to flush the drives cache, the process would repeat.

In this case, it took about 3 check scrubs to actually hit the read 
error and force a re-write.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html