Re: 3TB drives failure rate

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Sun, 28 Oct 2012 14:50:05 -0600

On Oct 28, 2012, at 2:34 PM, Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote:

> 
> That's not true. A drive that can sit there like a bump on the log for 2 minutes before it issues an actual read failure on sector(s) means mdadm is likewise waiting, and everything above it is waiting. That's a long hang. I'd not be surprised to see users go for hard reset with such a system hang after 45 seconds let alone 2 minutes.
> 
> Ideally what you'd get is a quick first error recovery with a clean normally operating array. As soon as the first drive fails, the system would set the remaining drives to a slightly longer error recovery time so that you don't get nearly as quick error recovery on the remaining drives - ask them to try a little harder before they error out. If you get another read error, there is no mirror or parity to rebuild from. Best to try a little longer in such a degraded state.

In fact the long error recovery of consumer drives *prevents* automatic correction by md. At least it significantly delays the correction.

That drive, if it can recover a sector in 30 seconds (let alone 2 minutes) instead of failing it, will not be corrected; md won't get an alternate from mirror or from parity, and won't overwrite that obvious flakey sector. So instead you get a propensity for bad sector accumulation, even in the case of regular check scrubs.

For any serious use I just wouldn't use the Greens, without very non-consumer like scrubs, extended smart tests, and cycling out drives so they could be ATA Enhance Secure Erase nuked say once a year or maybe more often. And a rigorous backup. With that kind of expertise and dedication should come a better budget for a better drive.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html