Re: Hardware recommendation / calculation for large cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Saturday 11 May 2013 16:04:27 Leen Besselink wrote:
 
> Someone is going to correct me if I'm wrong, but I think you misread
> something.
>
>
> The Mon-daemon doesn't need that much RAM:
> 
> The 'RAM: 1 GB per daemon' is per Mon-daemon, not per OSD-daemon.
> 
Gosh, I feel embarresed. This ectually was my main concern / bottleneck. 
Thanks for pointing this out. Seems Ceph really rocks in deploying affordable 
data clusters.

Regards, Tim

> On Sat, May 11, 2013 at 03:42:59PM +0200, Tim Mohlmann wrote:
> > Hi,
> > 
> > First of all I am new to ceph and this mailing list. At this moment I am
> > looking into the possibilities to get involved in the storage business. I
> > am trying to get an estimate about costs and after that I will start to
> > determine how to get sufficient income.
> > 
> > First I will describe my case, at the bottom you will find my questions.
> > 
> > 
> > GENERAL LAYOUT:
> > 
> > Part of this cost calculation is of course hardware. For the larger part
> > I've already figured it out. In my plans I will be leasing a full rack
> > (46U). Depending on the domestic needs I will be using 36 or 40U for ODS
> > storage servers. (I will assume 36U from here on, to keep a solid value
> > for calculation and have enough spare space for extra devices).
> > 
> > Each OSD server uses 4U and can take 36x3.5" drives. So in 36U I can put
> > 36/4=9 OSD servers, containing 9*36=324 HDDs.
> > 
> > 
> > HARD DISK DRIVES
> > 
> > I have been looking for WD digital RE and RED series. RE is more expensive
> > per GB, but has a larger MTBF and offers a 4TB model. RED is (real) cheap
> > per GB, but only goes as far a 3TB.
> > 
> > At my current calculations it does not matter much if I would put
> > expensive WD RE 4TB disks or cheaper WD RED 3TB, the price per GB over
> > the complete cluster expense and 3 years of running costs (including AFR)
> > is almost the same.
> > 
> > So basically, if I could reduce the costs of all the other components used
> > in the cluster, I would go for the 3TB disk and if the costs will be
> > higher then my first calculation, I would use the 4TB disk.
> > 
> > Let's assume 4TB from now on. So, 4*324=1296TB. So lets go Peta-byte ;).
> > 
> > 
> > NETWORK
> > 
> > I will use a redundant 2x10Gbe network connection for each node. Two
> > independent 10Gbe switches will be used and I will use bonding between the
> > interfaces on each node. (Thanks some guy in the #Ceph irc for pointing
> > this option out). I will use VLAN's to split front-side, backside and
> > Internet networks.
> > 
> > 
> > OSD SERVER
> > 
> > SuperMicro based, 36 HDD hotswap. Dual socket mainboard. 16x DIMM sockets.
> > It is advertised they can take up to 512GB of RAM. I will install 2 x
> > Intel Xeon E5620 2.40ghz processor, having 4 cores and 8 threads each.
> > For the RAM I am in doubt (see below). I am looking into running 1 OSD
> > per disk.
> > 
> > 
> > MON AND MDS SERVERS
> > 
> > Now comes the big question. What specs are required? It first I had the
> > plan to use 4 SuperMicro superservers, with a 4 socket mainboards that
> > contain up to the new 16core AMD processors and up to 1TB of RAM.
> > 
> > I want all 4 of the servers to run a MON service, MDS service and costumer
> > / public services. Probably I would use VM's (kvm) to separate them. I
> > will compile my own kernel to enable Kernel Samepage Merge, Hugepage
> > support and memory compaction to make RAM use more efficient. The
> > requirements for my public services will be added up, once I know what I
> > need for MON and MDS.
> > 
> > 
> > RAM FOR ALL SERVERS
> > 
> > So what would you estimate to be the ram usage?
> > http://ceph.com/docs/master/install/hardware-recommendations/#minimum-
> > hardware-recommendations.
> > 
> > Sounds OK for the OSD part. 500 MB per daemon, would put the minimum RAM
> > requirement for my OSD server to 18GB. 32GB should be more then enough.
> > Although I would like to see if it is possible to use btrfs compression?
> > In
> > that case I'd need more RAM in there.
> > 
> > What I really want to know: how many RAM do I need for MON and MDS
> > servers?
> > 1GB per daemon sounds pretty steep. As everybody knows, RAM is expensive!
> > 
> > In my case I would need at least 324 GB of ram for each of them. Initially
> > I was planning to use 4 servers and each of them running both. Joining
> > those in a single system, with the other duties the system has to perform
> > I would need the full 1TB of RAM. I would need to use 32GB modules witch
> > are really expensive per GB and difficult to find. (not may server
> > hardware vendors in the Netherlands have them).
> > 
> > 
> > QUESTIONS
> > 
> > Question 1: Is it really the amount for OSD's that counts for MON and MDS
> > RAM usage, or the size of the object store?
> > 
> > Question 2: can I do it with less RAM? Any statistics, or better: a
> > calculation? I can imagine memory pages becoming redundant if the cluster
> > grows, so less memory required per OSD.
> > 
> > Question 3: If it is the amount of OSDs that counts, would it be
> > beneficial to combine disks in a RAID 0 (lvm or btrfs) array?
> > 
> > Question 4: Is it safe / possible to store MON files inside of the cluster
> > itself? The 10GB per daemon requirement would mean I need 3240GB of
> > storage
> > for each MON, meaning I need to get some huge disks and a (lvm) RAID 1
> > array for redundancy, while I have a huge redundant file sytem at hand
> > already.
> > 
> > Question 5: Is it possible to enable btrfs compression? I know btrfs is
> > not
> > stable for production yet, but it would be nice if compression is
> > supported in the future, when it does become stable
> > 
> > If the RAM requirement is not so steep, I am thinking about the
> > possibility to run the MON service from 4 OSD servers. Upgrading them to
> > 16x16GB of RAM would give me 256GB of RAM. (Again, 32GB modules are to
> > expensive and not an option). This would obsolete 2 superservers,
> > decreasing their workload, and keep some spare computing power for future
> > growth. The only reason I needed them is for RAM capacity.
> > 
> > Getting rid of 2 superservers, will provide me with enough space to fit a
> > 10th storage server. This will considerable reduce the total cost per GB
> > of this cluster. (Comparing all the hardware without the HDDs, the 4
> > superservers are the most expensive part)
> > 
> > I completely understand if you think: hey! that kind of things should be
> > corporate advise etc. Please understand I am just an individual, just
> > working his (non-IT) job and has Linux and open-source as a hobby. I just
> > started brainstorming on some business opportunities. If this story would
> > be feasible, I would use this information to make a business and
> > investment plan and look for investors.
> > 
> > Thanks and best regards,
> > 
> > Tim Mohlmann
> > 
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux