On 04/15/2013 03:25 AM, Stas Oskin wrote:
Hi, Like I said, it's just my instinct. For a 180TB (raw) cluster you've got some tough choices to make. Some options might include: 1) high density and low cost by just stick a bunch of 3GB drives in 5 2U nodes and make sure you don't fill the cluster past ~75% (which you probably don't want to do from a performance perspective anyway). Just acknowledge that during failure/recovery there's going to be a ton of traffic flying around between the remaining 4 nodes. 2) Lower density (1-2GB) drives and more 2U nodes for higher performance but lower density and greater expense. 3) high eventual density and low eventual cost by buying 2U nodes that are only partially filled with 3TB drives with the assumption that the cluster is going to grow larger down the road. 4) 15 4-drive 1U nodes for less impact during recovery but greater expense and lower density. All of these options have benefits and downsides. For production cluster I'd want more than 5 nodes, but it wouldn't be the only consideration (cost, density, performance, etc all would play a part). To summarize, you recommend to focus on 2U servers, rather then 4U (HP, SuperMicro and so), and the best strategy seems to be start filling them with 3TB disks, spreading over the servers evenly.
It's not so much about the chassis size, but how much capacity you lose (and data you have to re-replicate) on the rest of the cluster during an outage. A SL4500 chassis with 2 nodes is going to be different than a SL4500 chassis with 1 node even though in both cases the package is still "4U". You lose density with 2 nodes per chassis but double the overall number of nodes and potentially improve performance. If you'd like more in-depth recommendations about your cluster design, we (Inktank) do provide consulting service to look at your specific requirements and help you weigh all of these different factors when building your cluster.
By the way, why 5 servers are so important? Why not 3 or 7 for the matter?
It's not really. You just said 180TB, so with 2U servers and 3TB drives you can do that in 5 nodes. You could also do that in 15-20 1U nodes, or 2 36-drive 4U nodes. The fewer servers you have, the greater the impact of a server outage is. IE if you have 2 36 drive servers and you lose one, you've lost half the cluster capacity which is a big deal if you were already over 50% disk utilization. The trade-off is that 20 1U servers takes up a lot more space and costs more than 2 4U boxes.
Mark
Thanks again, Stas.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com