Goswin von Brederlow wrote:
On top of that the stress of rebuilding usualy greatly increases the
chances. And with large raids and todays large disks we are talking
days to weeks or rebuild time. As you said, the 433 years are assuming
that one drive failure doesn't cause another one to fail. In reality
that seems to be a real factor though.
I am intrigued as to what this extra stress actually is.
I could understand if the drives were head thrashing for hours, but as I
understand it, a rebuild just has all drives reading/writing in an
orderly cylinder by cylinder fashion, so while the read/write
electronics are being exercised continuously, mechanically there is not
much going on, except I guess for the odd remapped sector that would
involve a seek.
I figure that by far the more common reason for the array to fail due to
another disc being kicked out, is due to undiscovered uncorrectable read
errors.
The risk of striking these can be reduced by regularly performing md
"check" or "repairs" - echo check > /sys/block/mdX/md/sync_action.
Regards,
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html