Maarten wrote:
Peter Grandi wrote:
This weekend I promoted my new 6-disk raid6 array to
production use and was busy copying data to it overnight. The
next morning the machine had crashed, and the array is down
with an (apparent?) 4-disk failure, [ ... ]
Multiple drive failures are far more common than people expect,
and the problem lies in people's expectations, because they don't
do common mode analysis (what's what? many will think).
It IS more common indeed. I'm on my seventh or eight raid-5 array now,
the first was a 4-disk raid5 40(120) GB array. I've had 4 or 5
two-disk failures happen to me over the years, invariably during
rebuild, indeed.
This is why I'm switching over to raid-6, by the way.
I did not, at any point, lose the array with the two-disk failures
though. I intelligently cloned bad drives with dd_rescue and
reassembled those degraded arrays using the new disks and thus got my
data back.
But still, such events tend to keep me busy for a whole weekend, which
is not too pleasant.
They typically happen all at once at power up, or in short
succession (e.g. 2nd drive fails while syncing to recover from
1st failure).
The typical RAID has N drives from the same manufacturer, of the
same model, with nearly contiguous serial numbers, from the same
shipping carton, in an enclosure where they all are started and
stopped at the same time, run on the same power circuit, at the
same temperature, on much the same load, attached to the same
host adapter or N of the same type. Expecting as many do to have
uncorrelated failures is rather comical.
This is true. However, since I know this fact I tend to take care to
not make it too vulnerable; the system is incredibly well cooled, it
has 8 80mm fans that cool the 16(!) disks, I buy disks in batches of
2, from different brands and vendors. It indeed has just one PSU, but
I chose a good one, I think it's a Tagan 550 Watt unit.
In fact -this is my home system- since I cannot afford a DLT drive for
this much data I practically have no backup, so I really spend a lot
of effort making sure the array stays ok. Yes, I know, this not a good
idea, but how do I economically backup 3 TB ?
In practice I have older disks and/or decommisioned arrays with
"backups" but this is of course not up to date at all.
Given the low cost of USB connected TB drives, I would say "look there"
rather than expect to be able to keep any system totally reliable.
--
Bill Davidsen <davidsen@xxxxxxx>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html