Hello, On Thu, 4 Sep 2014 22:51:35 +0000 Zojonc, Josh wrote: > I?m trying to figure if its possible to compute how many nodes I can > lose. I?ve got a cluster of 6 boxes with 9 disks each for a total of 54 > OSD?s. There are a total of 1800 placement groups on those OSD?s. The > replica size is 3. > > Is there a way to figure it out or is more information needed? > 3 storage nodes (replica size), if you're using the default crush map or one that distributes PGs on separate hosts at the very least. This is assuming that your monitors (at least 3) are on different machines or that all your OSD servers have a MON and one external MON. However your cluster performance will be impacted by this (data being moved and re-replicated), probably to the point of being unusable due to high IO load. And that restoration of replica 3 will of course need space on your surviving nodes, space that you may not have if they were more than 50% full. Leading to a full OSD situation, which will lock up your cluster and make it unusable until resolved (a node restored, new nodes, OSDs added). And when restoring failed nodes the backfilling IO load may bring your cluster to its knees again. So again, the simple answer is 3. The real answer is "it depends" and you do want to test all kind of failure modes (with a non-empty cluster) before going into production. Regards, Christian -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/