Re: Ceph Supermicro hardware recommendation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If it's of any interest, we are building our cluster with these:-

http://www.supermicro.nl/products/system/4U/F617/SYS-F617H6-FTPT_.cfm

It seemed to us that with 2U servers quite a fair chunk of the cost goes on the metal case and redundant power supplies. The Storage optimised Fat Twin seemed a good comprise. It does mean that disk replacements require taking the node offline, but they will be in a remote colo, so frequent disk replacements are out of the question anyway and so we plan to replace disks during planned maintenance visits. Also having double the disk density of 2U servers means we can get more in a costly rack.

They also have on board 10G-T Nic's which means the netowrking cost is kept low.

I'm not recommending to you them as everybody's requirements are different, but thought you might find the insight interesting

Nick

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Colombo Marco
Sent: 04 February 2015 09:20
To: Christian Balzer; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Ceph Supermicro hardware recommendation

Hi Christian,



On 04/02/15 02:39, "Christian Balzer" <chibi@xxxxxxx> wrote:

>On Tue, 3 Feb 2015 15:16:57 +0000 Colombo Marco wrote:
>
>> Hi all,
>>  I have to build a new Ceph storage cluster, after i‘ve read the 
>> hardware recommendations and some mail from this mailing list i would 
>> like to buy these servers:
>> 
>
>Nick mentioned a number of things already I totally agree with, so 
>don't be surprised if some of this feels like a repeat.
>
>> OSD:
>> SSG-6027R-E1R12L ->
>> http://www.supermicro.nl/products/system/2U/6027/SSG-6027R-E1R12L.cfm
>> Intel Xeon e5-2630 v2 64 GB RAM
>As nick said, v3 and more RAM might be helpful, depending on your use 
>case (small writes versus large ones) even faster CPUs as well.

Ok, we switch from v2 to v3 and from 64 to 96 GB of RAM.

>
>> LSI 2308 IT
>> 2 x SSD Intel DC S3700 400GB
>> 2 x SSD Intel DC S3700 200GB
>Why the separation of SSDs? 
>They aren't going to be that busy with regards to the OS.

We would like to use 400GB SSD for a cache pool, and 200GB SSD for the journaling.

>
>Get a case like Nick mentioned with 2 2.5 bays in the back, put 2 DC 
>S3700 400GBs in there (connected to onboard 6Gb/s SATA3), partition 
>them so that you have a RAID1 for OS and plain partitions for the 
>journals of the now
>12
>OSD HDDs in your chassis. 
>Of course this optimization in terms of cost and density comes with a 
>price, if one SSD should fail, you will have 6 OSDs down.
>Given how reliable the Intels are this is unlikely, but something you 
>need to consider.
>
>If you want to limit the impact of a SSD failure and have just 2 OSD 
>journals per SSD, get a chassis like the one above and 4 DC S3700 
>200GB,
>RAID10 them for the OS and put 2 journal partitions on each. 
>
>I did the same with 8 3TB HDDs and 4 DC S3700 100GB, the HDDs (and CPU 
>with 4KB IOPS), are the limiting factor, not the SSDs.
>
>> 8 x HDD Seagate Enterprise 6TB
>Are you really sure you need that density? One disk failure will result 
>in a LOT of data movement once these become somewhat full.
>If you were to go for a 12 OSD node as described above, consider 4TB 
>ones for the same overall density, while having more IOPS and likely 
>the same price or less.

We choosen the 6TB of disk, because we need a lot of storage in a small amount of server and we prefer server with not too much disks.
However we plan to use max 80% of a 6TB Disk

>
>> 2 x 40GbE for backend network
>You'd be lucky to write more that 800MB/s sustained to your 8 HDDs 
>(remember they will have to deal with competing reads and writes, this 
>is not a sequential synthetic write benchmark).
>Incidentally 1GB/s to 1.2GB/s (depending on configuration) would also 
>be the limit of your journal SSDs.
>Other than backfilling caused by cluster changes (OSD removed/added), 
>your limitation is nearly always going to be IOPS, not bandwidth.


Ok, after some discussion, we switch to 2 x 10 GbE.

>
>So 2x10GbE or if you're comfortable with it (I am ^o^) an Infiniband 
>backend (can be cheaper, less latency, plans for RDMA support in
>Ceph) should be more than sufficient.
>
>> 2 x 10GbE  for public network
>> 
>> META/MON:
>> 
>> SYS-6017R-72RFTP ->
>> http://www.supermicro.com/products/system/1U/6017/SYS-6017R-72RFTP.cf
>> m 2 x Intel Xeon e5-2637 v2 4 x SSD Intel DC S3500 240GB raid 1+0
>You're likely to get better performance and of course MUCH better 
>durability by using 2 DC S3700, at about the same price.

Ok we switch to 2 x SSD DC S3700

>
>> 128 GB RAM
>Total overkill for a MON, but I have no idea about MDS and RAM never 
>hurts.

Ok we switch from 128 to 96

>
>In your follow-up you mentioned 3 mons, I would suggest putting 2 more 
>mons (only, not MDS) on OSD nodes and make sure that within the IP 
>numbering the "real" mons have the lowest IP addresses, because the MON 
>with the lowest IP becomes master (and thus the busiest).
>This way you can survive a loss of 2 nodes and still have a valid quorum.

Ok, got it


>
>Christian
>
>> 2 x 10 GbE
>> 
>> What do you think?
>> Any feedbacks, advices, or ideas are welcome!
>> 
>> Thanks so much
>> 
>> Regards,
>
>
>-- 
>Christian Balzer        Network/Systems Engineer                
>chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
>http://www.gol.com/

Thanks so much!

>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux