Re: Failure probability with largish deployments

Wido den Hollander <wido@xxxxxxxx> · Thu, 19 Dec 2013 12:42:15 +0100

On 12/19/2013 09:39 AM, Christian Balzer wrote:

Hello,

In my "Sanity check" thread I postulated yesterday that to get the same
redundancy and resilience for disk failures (excluding other factors) as
my proposed setup (2 nodes, 2x 11 3TB HDs RAID6 per node, 2
global hotspares, thus 4 OSDs) the "Ceph way" one would need need something
like 6 nodes with 10 3TB HDs each, 3 way replication (to protect against
dual disk failures) to get the similar capacity and a 7th identical node to
allow for node failure/maintenance.

That was basically based on me thinking "must not get caught be a dual
disk failure ever again", as that happened twice to me, once with a RAID5
and the expected consequences, once with a RAID10 where I got lucky (8
disks total each time).

However something was nagging me at the back of my brain and turned out to
be my long forgotten statistics classes in school. ^o^

So I after reading some articles basically telling the same things I found
this: https://www.memset.com/tools/raid-calculator/

Now this is based on assumptions, onto which I will add some more, but the
last sentence on that page still is quite valid.

So lets compare these 2 configurations above, I assumed 75GB/s recovery
speed for the RAID6 configuration something I've seen in practice.
Basically that's half speed, something that will be lower during busy hours
and higher during off peak hours. I made the same assumption for Ceph with
a 10Gb/s network, assuming 500GB/s recovery/rebalancing speeds.
The rebalancing would have to compete with other replication traffic
(likely not much of an issue) and the actual speed/load of the individual
drives involved. Note that if we assume a totally quiet setup, were 100%
of all resources would be available for recovery the numbers would of
course change, but NOT their ratios.
I went with the default disk lifetime of 3 years and 0 day replacement
time. The latter of course gives very unrealistic results for anything w/o
hotspare drive, but we're comparing 2 different beasts here.

So that all said, the results of that page that make sense in this
comparison are the RAID6 +1 hotspare numbers. As in, how likely is a 3rd
drive failure in the time before recovery is complete, the replacement
setting of 0 giving us the best possible number and since one would deploy
a Ceph cluster with sufficient extra capacity that's what we shall use.

For the RAID6 setup (12 HDs total) this gives us a pretty comfortable
1 in 58497.9 ratio of data loss per year.
Alas for the 70 HDs in the comparable Ceph configuration we wind up with
just a 1 in 13094.31 ratio, which while still quite acceptable clearly
shows where this is going.

So am I completely off my wagon here?
How do people deal with this when potentially deploying hundreds of disks
in a single cluster/pool?

I'd suggest to use different vendors for the disks, so that means you'll 
probably be mixing Seagate and Western Digital in such a setup.

In this case you can also rule out batch issues with disks, but the 
likelihood of the same disks failing becomes smaller as well.

Also, make sure that you define your crushmap that replicas never and up 
on the same physical host and if possible not in the same cabinet/rack.

I would never run with 60 drives in a single machine in a Ceph cluster, 
I'd suggest you use more machines with less disks per machine.

I mean, when we get too 600 disks (and that's just one rack full, OK,
maybe 2 due to load and other issues ^o^) of those 4U 60 disk storage
servers (or 72 disk per 4U if you're happy with killing another drive when
replacing a faulty one in that Supermicro contraption), that ratio is down
to 1 in 21.6 which is way worse than that 8disk RAID5 I mentioned up there.

Regards,

Christian

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com