Re: Ceph Supermicro hardware recommendation

Colombo Marco <Marco.Colombo@xxxxxxxx> · Wed, 4 Feb 2015 09:20:24 +0000

Hi Christian,

On 04/02/15 02:39, "Christian Balzer" <chibi@xxxxxxx> wrote:

>On Tue, 3 Feb 2015 15:16:57 +0000 Colombo Marco wrote:
>
>> Hi all,
>>  I have to build a new Ceph storage cluster, after i‘ve read the
>> hardware recommendations and some mail from this mailing list i would
>> like to buy these servers:
>> 
>
>Nick mentioned a number of things already I totally agree with, so don't
>be surprised if some of this feels like a repeat.
>
>> OSD:
>> SSG-6027R-E1R12L ->
>> http://www.supermicro.nl/products/system/2U/6027/SSG-6027R-E1R12L.cfm
>> Intel Xeon e5-2630 v2 64 GB RAM
>As nick said, v3 and more RAM might be helpful, depending on your use case
>(small writes versus large ones) even faster CPUs as well.

Ok, we switch from v2 to v3 and from 64 to 96 GB of RAM.

>
>> LSI 2308 IT
>> 2 x SSD Intel DC S3700 400GB
>> 2 x SSD Intel DC S3700 200GB
>Why the separation of SSDs? 
>They aren't going to be that busy with regards to the OS.

We would like to use 400GB SSD for a cache pool, and 200GB SSD for the 
journaling.

>
>Get a case like Nick mentioned with 2 2.5 bays in the back, put 2 DC S3700
>400GBs in there (connected to onboard 6Gb/s SATA3), partition them so that
>you have a RAID1 for OS and plain partitions for the journals of the now 
>12
>OSD HDDs in your chassis. 
>Of course this optimization in terms of cost and density comes with a
>price, if one SSD should fail, you will have 6 OSDs down. 
>Given how reliable the Intels are this is unlikely, but something you need
>to consider.
>
>If you want to limit the impact of a SSD failure and have just 2 OSD
>journals per SSD, get a chassis like the one above and 4 DC S3700 200GB,
>RAID10 them for the OS and put 2 journal partitions on each. 
>
>I did the same with 8 3TB HDDs and 4 DC S3700 100GB, the HDDs (and CPU
>with 4KB IOPS), are the limiting factor, not the SSDs.
>
>> 8 x HDD Seagate Enterprise 6TB
>Are you really sure you need that density? One disk failure will result in
>a LOT of data movement once these become somewhat full.
>If you were to go for a 12 OSD node as described above, consider 4TB ones
>for the same overall density, while having more IOPS and likely the same
>price or less.

We choosen the 6TB of disk, because we need a lot of storage in a small 
amount of server and we prefer server with not too much disks.
However we plan to use max 80% of a 6TB Disk

>
>> 2 x 40GbE for backend network
>You'd be lucky to write more that 800MB/s sustained to your 8 HDDs
>(remember they will have to deal with competing reads and writes, this is
>not a sequential synthetic write benchmark). 
>Incidentally 1GB/s to 1.2GB/s (depending on configuration) would also be
>the limit of your journal SSDs.
>Other than backfilling caused by cluster changes (OSD removed/added), your
>limitation is nearly always going to be IOPS, not bandwidth.

Ok, after some discussion, we switch to 2 x 10 GbE.

>
>So 2x10GbE or if you're comfortable with it (I am ^o^) an Infiniband
>backend (can be cheaper, less latency, plans for RDMA support in
>Ceph) should be more than sufficient.
>
>> 2 x 10GbE  for public network
>> 
>> META/MON:
>> 
>> SYS-6017R-72RFTP ->
>> http://www.supermicro.com/products/system/1U/6017/SYS-6017R-72RFTP.cfm 2
>> x Intel Xeon e5-2637 v2 4 x SSD Intel DC S3500 240GB raid 1+0
>You're likely to get better performance and of course MUCH better
>durability by using 2 DC S3700, at about the same price.

Ok we switch to 2 x SSD DC S3700

>
>> 128 GB RAM
>Total overkill for a MON, but I have no idea about MDS and RAM never 
>hurts.

Ok we switch from 128 to 96

>
>In your follow-up you mentioned 3 mons, I would suggest putting 2 more
>mons (only, not MDS) on OSD nodes and make sure that within the IP
>numbering the "real" mons have the lowest IP addresses, because the MON
>with the lowest IP becomes master (and thus the busiest). 
>This way you can survive a loss of 2 nodes and still have a valid quorum.

Ok, got it

>
>Christian
>
>> 2 x 10 GbE
>> 
>> What do you think?
>> Any feedbacks, advices, or ideas are welcome!
>> 
>> Thanks so much
>> 
>> Regards,
>
>
>-- 
>Christian Balzer        Network/Systems Engineer                
>chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
>http://www.gol.com/

Thanks so much!

>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com