Hardware recommendation / calculation for large cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

First of all I am new to ceph and this mailing list. At this moment I am 
looking into the possibilities to get involved in the storage business. I am 
trying to get an estimate about costs and after that I will start to determine 
how to get sufficient income.

First I will describe my case, at the bottom you will find my questions.


GENERAL LAYOUT:

Part of this cost calculation is of course hardware. For the larger part I've 
already figured it out. In my plans I will be leasing a full rack (46U). 
Depending on the domestic needs I will be using 36 or 40U for ODS storage 
servers. (I will assume 36U from here on, to keep a solid value for 
calculation and have enough spare space for extra devices).

Each OSD server uses 4U and can take 36x3.5" drives. So in 36U I can put 
36/4=9 OSD servers, containing 9*36=324 HDDs.


HARD DISK DRIVES

I have been looking for WD digital RE and RED series. RE is more expensive per 
GB, but has a larger MTBF and offers a 4TB model. RED is (real) cheap per GB, 
but only goes as far a 3TB.

At my current calculations it does not matter much if I would put expensive WD 
RE 4TB disks or cheaper WD RED 3TB, the price per GB over the complete cluster 
expense and 3 years of running costs (including AFR) is almost the same.

So basically, if I could reduce the costs of all the other components used in 
the cluster, I would go for the 3TB disk and if the costs will be higher then 
my first calculation, I would use the 4TB disk.

Let's assume 4TB from now on. So, 4*324=1296TB. So lets go Peta-byte ;).


NETWORK

I will use a redundant 2x10Gbe network connection for each node. Two 
independent 10Gbe switches will be used and I will use bonding between the 
interfaces on each node. (Thanks some guy in the #Ceph irc for pointing this 
option out). I will use VLAN's to split front-side, backside and Internet 
networks.


OSD SERVER

SuperMicro based, 36 HDD hotswap. Dual socket mainboard. 16x DIMM sockets. It 
is advertised they can take up to 512GB of RAM. I will install 2 x Intel Xeon 
E5620 2.40ghz processor, having 4 cores and 8 threads each. For the RAM I am 
in doubt (see below). I am looking into running 1 OSD per disk.


MON AND MDS SERVERS

Now comes the big question. What specs are required? It first I had the plan to 
use 4 SuperMicro superservers, with a 4 socket mainboards that contain up to 
the new 16core AMD processors and up to 1TB of RAM.

I want all 4 of the servers to run a MON service, MDS service and costumer / 
public services. Probably I would use VM's (kvm) to separate them. I will 
compile my own kernel to enable Kernel Samepage Merge, Hugepage support and 
memory compaction to make RAM use more efficient. The requirements for my public 
services will be added up, once I know what I need for MON and MDS.


RAM FOR ALL SERVERS

So what would you estimate to be the ram usage?
http://ceph.com/docs/master/install/hardware-recommendations/#minimum-
hardware-recommendations.

Sounds OK for the OSD part. 500 MB per daemon, would put the minimum RAM 
requirement for my OSD server to 18GB. 32GB should be more then enough. 
Although I would like to see if it is possible to use btrfs compression? In 
that case I'd need more RAM in there.

What I really want to know: how many RAM do I need for MON and MDS servers? 
1GB per daemon sounds pretty steep. As everybody knows, RAM is expensive!

In my case I would need at least 324 GB of ram for each of them. Initially I 
was planning to use 4 servers and each of them running both. Joining those in 
a single system, with the other duties the system has to perform I would need 
the full 1TB of RAM. I would need to use 32GB modules witch are really 
expensive per GB and difficult to find. (not may server hardware vendors in the 
Netherlands have them).


QUESTIONS

Question 1: Is it really the amount for OSD's that counts for MON and MDS RAM 
usage, or the size of the object store?

Question 2: can I do it with less RAM? Any statistics, or better: a 
calculation? I can imagine memory pages becoming redundant if the cluster 
grows, so less memory required per OSD.

Question 3: If it is the amount of OSDs that counts, would it be beneficial to 
combine disks in a RAID 0 (lvm or btrfs) array?

Question 4: Is it safe / possible to store MON files inside of the cluster 
itself? The 10GB per daemon requirement would mean I need 3240GB of storage 
for each MON, meaning I need to get some huge disks and a (lvm) RAID 1 array 
for redundancy, while I have a huge redundant file sytem at hand already.

Question 5: Is it possible to enable btrfs compression? I know btrfs is not 
stable for production yet, but it would be nice if compression is supported in 
the future, when it does become stable

If the RAM requirement is not so steep, I am thinking about the possibility to 
run the MON service from 4 OSD servers. Upgrading them to 16x16GB of RAM would 
give me 256GB of RAM. (Again, 32GB modules are to expensive and not an 
option). This would obsolete 2 superservers, decreasing their workload, and 
keep some spare computing power for future growth. The only reason I needed 
them is for RAM capacity.

Getting rid of 2 superservers, will provide me with enough space to fit a 10th 
storage server. This will considerable reduce the total cost per GB of this 
cluster. (Comparing all the hardware without the HDDs, the 4 superservers are 
the most expensive part)

I completely understand if you think: hey! that kind of things should be 
corporate advise etc. Please understand I am just an individual, just working 
his (non-IT) job and has Linux and open-source as a hobby. I just started 
brainstorming on some business opportunities. If this story would be feasible, 
I would use this information to make a business and investment plan and look 
for investors.

Thanks and best regards,

Tim Mohlmann


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux