Simple Math?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Thu, 4 Sep 2014 22:51:35 +0000 Zojonc, Josh wrote:

> I?m trying to figure if its possible to compute how many nodes I can
> lose.  I?ve got a cluster of 6 boxes with 9 disks each for a total of 54
> OSD?s.   There are a total of 1800 placement groups on those OSD?s.  The
> replica size is 3.
> 
> Is there a way to figure it out or is more information needed?
> 
3 storage nodes (replica size), if you're using the default crush map or
one that distributes PGs on separate hosts at the very least.
This is assuming that your monitors (at least 3) are on different machines 
or that all your OSD servers have a MON and one external MON.

However your cluster performance will be impacted by this (data being
moved and re-replicated), probably to the point of being unusable due to
high IO load.
And that restoration of replica 3 will of course need space on your
surviving nodes, space that you may not have if they were more than 50%
full. 
Leading to a full OSD situation, which will lock up your cluster and make
it unusable until resolved (a node restored, new nodes, OSDs added).
And when restoring failed nodes the backfilling IO load may bring your
cluster to its knees again.

So again, the simple answer is 3. 
The real answer is "it depends" and you do want to test all kind of
failure modes (with a non-empty cluster) before going into production.

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux