Re: Failure probability with largish deployments

Christian Balzer <chibi@xxxxxxx> · Fri, 20 Dec 2013 12:48:38 +0900



Hello,

On Thu, 19 Dec 2013 15:43:16 +0000 Gruher, Joseph R wrote:

[snip]
> 
> It seems like this calculation ignores that in a large Ceph cluster with
> triple replication having three drive failures doesn't automatically
> guarantee data loss (unlike a RAID6 array)?  If your data is triple
> replicated and a copy of a given piece of data exists in three disks
> separate disks in the cluster, and you have three disks fail, the odds
> of it being the only three disks with copies of that data should be
> pretty low for a very large number of disks.  For the 600 disk cluster,
> after the first disk fails you'd have a 2 in 599 chance of losing the
> second copy when the second disk fails, then a 1 in 598 chance of losing
> the third copy when the third disk fails, so even assuming a triple disk
> failure has already happened don't you still have something like a
> 99.94% chance that you didn't lose all copies of your data?  And then if
> there's only a 1 in 21.6 chance of having a triple disk failure outpace
> recovery in the first place that gets you to something like 99.997%
> reliability?
> 
I think putting that number into perspective with a real event unfolding
just now in a data center that's not local and where no monkeys are
available might help.
24disk server, RAID6, one hotspare. 4 years old now, crappy Seagates
failing, already replaced 6. 
On drive failed 2 days ago, yesterday nobody was available to go there and
swap a fresh one in, last night the next drive failed and now somebody is
dashing there with 2 spares. ^o^
More often than not the additional strain of recovery will push disks over
the edge, aside from increasing likelihood for clustered failures with
certain drives or when reaching  certain ages.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com