On Thu, 22 Dec 2005, Bill Davidsen wrote:
If you are seeing dual drive failures, I suspect your hardware has problems.
We run multiple 3 and 6 TB databases, and over a dozen 1 TB data caching
servers, all using a lot of small fast disk, and I haven't seen a real dual
drive failure in about 8 years.
We did see some cases which looked like dual failures, it turned out to be a
firmware limitation, controller not waiting for the bus to settle after a
real failure, and thinking the next i/o had failed (or similar, in any case a
false fail on the transaction after the real fail). If you run two PATA
drives on the same cable in master/slave, it's at least possible that this
could happen with consumer grade hardware as well. Just a thought, dual
failures are VERY unlikely unless one triggers the other in some way, like
failing the bus or cabinet power supply.
Not really, it depends on how lucky you are with your disks. We've had
real dual-drive failures in a system with hot spares, where the second
drive failed during resync.
Now, we have gotten the manufacturer to replace those with another model,
but some of these problems don't occur until a year of two into production
use (I haven't seen quite this bad, but a high replacement rate ramping up
after a while).
Choosing high-end disks usually helps, but raid6 is really great in that
you always have redundancy, even when replacing a failed or failing drive.
If you have a large raidset with a fairly heavy load the resync time can
easily extend into days, if not weeks. With raid5, during that entire
period, if one more drive fails you're screwed.
Btw, in our practical usage, we haven't seen that big a difference
between raid5 and rad6, but I guess that depends on your usage pattern.
/Mattias Wadenstein
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html