Re: 3TB drives failure rate

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Sun, 28 Oct 2012 15:51:40 -0600

On Oct 28, 2012, at 3:18 PM, Roman Mamedov <rm@xxxxxxxxxx> wrote:

> Now you are moving into unfounded superstitions territory.

Umm, wholesale trusting data on drives is superstition.

> 
> You seem to imply that a Green drive would return just plain bad data in some
> failure condition (else why all the checksumming FSes?), and not an IO Error.

It certainly has a LOT longer to produce a result other than a read error. And we know ECC doesn't always correct for error correctly. So statistically there is a much larger window for bogus error to be returned that the drive thinks it has legitimately reconstructed. Absolutely.

> I don't think anything of this sort has been demonstrated so far, and while
> this could happen due to bad RAM/chipset/controller/bus/cache/etc, this would
> have nothing to do specifically with "Greenness" of a drive, nor any
> particular model would be inherently more prone to that.

It's a particular consumer drive that happens to not have a disable feature for ERC. The consumer Hitachi deskstars do, last I encountered them. So I don't mean to pick on just the green drives, so you're right that this isn't about the greenness of the drive. It is the particular SCT ERC behavior, and even more to that it's the ECC. Some implementations are better than others.

> 
>> But still, once a drive is asked to retrieve an LBA, so long as the drive eventually reports it back correctly, the file system won't correct that sector merely for a delay, even if it is up to 2 minutes or whatever it is. So, filesystem choice doesn't really solve the delay problem. You just have to obliterate the disk periodically with zeros or secure erase.
> 
> I do not think there is a state in modern HDDs that there would be a sector
> which consistently takes 30-120 seconds to read. Those are either unreadable at
> all, or readable after a delay -- and then already remapped by the HDD into the
> reserved zone, so the delay is not there the next time.

Could be. Seems unclear. And these behaviors change between firmware revisions so there are all kinds of variables changing constantly.

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html