Re: Cost- and Powerefficient OSD-Nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Tuesday, April 28, 2015, Dominik Hannen <hannen@xxxxxxxxx> wrote:
Hi ceph-users,

I am currently planning a cluster and would like some input specifically about the storage-nodes.

The non-osd systems will be running on more powerful system.

Interconnect as currently planned:
4 x 1Gbit LACP Bonds over a pair of MLAG-capable switches (planned: EX3300)


One problem with LACP is that it will only allow you to have 1Gbps between any two IPs or MACs (depending on your switch config). This will most likely limit the throughput of any client to 1Gbps, which is equivalent to 125MBps storage throughput.  It is not really equivalent to a 4Gbps interface or 2x 2Gbps interfaces (if you plan to have a client network and cluster network). 

So far I would go with Supermicros 5018A-MHN4 offering, rack-space is not really a concern, so only 4 OSDs per U is fine.
(The cluster is planned to start with 8 osd-nodes.)

osd-node:
Avoton C2758 - 8 x 2.40GHz
16 GB RAM ECC
16 GB SSD - OS - SATA-DOM
250GB SSD - Journal (MX200 250GB with extreme over-provisioning, staggered deployment, monitored for TBW-Value)
4 x 3 TB OSD - Seagate Surveillance HDD (ST3000VX000) 7200rpm 24/7
4 x 1 Gbit

per-osd breakdown:
3 TB HDD
2 x 2.40GHz (Avoton-Cores)
4 GB RAM
8 GB SSD-Journal (~125 MB/s r/w)
1 Gbit

The main question is, will the Avoton CPU suffice? (I recon the common 1GHz/OSD suggestion are in regards to much more powerful CPUs.)

I don't have any experience with this CPU, but 8x 2.4GHz cores for 4 OSDs seems like plenty of CPU. 

I have 32GB of RAM for 7 osds, which has been enough for me. 

Are there any cost-effective suggestions to improve this configuration?

I have implemented a small cluster with no SSD journals, and the performance is pretty good.

42 osds, 3x replication, 40Gb NICs rados bench shows me 2000 iops at 4k writes and 500MBps at 4M writes. 

I would trade your SSD journals for 10Gb NICs and switches.  I started out with the same 4x 1Gb LACP config and things like rebalancing/recovery were terribly slow, as well as the throughput limit I mentioned above. 

When you get more funding next quarter/year, you can choose to add the SSD journals or more OSD nodes. Moving to 10Gb networking after you get the cluster up and running will be much harder. 


Will erasure coding be a feasible possibility?

Does it hurt to run OSD-nodes CPU-capped, if you have enough of them?

___
Dominik Hannen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux