Re: Redundant Power Supplies

Stijn De Weirdt <stijn.deweirdt@xxxxxxxx> · Thu, 30 Oct 2014 16:12:31 +0100

if you don't have 2 powerfeeds, don't spend the money.
if you have 2 feeds, well, start with 2 PSUs for your switches ;)

if you stick with one PSU for the OSDs, make sure you have your cabling 
(power and network, don't forget your network switches should be on same 
power feeds ;) and crushmap right.

with 2 PSUs you can probably live with an individual cabling error, but 
not with systematic ones.

on the definition of 2 powerfeeds: lets say at least behind a different 
breaker ;)

stijn

On 10/30/2014 03:56 PM, O'Reilly, Dan wrote:
The simple (to me, anyway) answer is "if your data is that important, spend the money to insure it".  A few hundred $$$, even over a couple hundred systems, is still good policy so far as I'm concerned, when you weigh the possible costs of not being able to access the data versus the cost of a power supply.

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Wido den Hollander
Sent: Thursday, October 30, 2014 8:54 AM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Redundant Power Supplies

On 10/30/2014 03:36 PM, Nick Fisk wrote:
What's everyone's opinions on having redundant power supplies in your
OSD nodes?

One part of me says let Ceph do the redundancy and plan for the
hardware to fail, the other side says that they are probably worth
having as they lessen the chance of losing a whole node.

Considering they can add £200-300 to a server, the cost can add up
over a number of nodes.

My worst case scenario is where you have dual power feeds A and B. In
this scenario if power feed B ever goes down (fuse/breaker maybe)
then suddenly half your cluster could disappear and start doing
massive recovery operations. I guess this could be worked around by
setting some sort of sub tree limit grouped by power feed.

Thoughts?

I did a deployment with single power supplies because of the reasoning you mention.

Each rack (3 in total) is split into 3 zones, each zone has it's own switch.

In the rack there are 6 machines on powerfeed A together with a switch.
A set of machines on B with a switch and there also is a STS switch which provides "powerfeed C".

Should a breaker trip in a cabinet we'll loose 6 machines at max. If powerfeed A or B goes down datacenter wide we'll loose 1/3 of the cluster.

In the CRUSHMap we defined powerfeeds where we place our replicas over the different powerfeeds.

mon_osd_down_out_subtree_limit has been set to "powerfeed" to prevent a whole powerfeed from being marked as "out".

This way we saved about EUR 300,00 per machine. On 54 machines that was quite a big save.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com