Re: 800TB - Ceph Physical Architecture Proposal

Maxime Guyot <Maxime.Guyot@xxxxxxxxx> · Fri, 8 Apr 2016 07:39:18 +0000

Hello,

On 08/04/16 04:47, "ceph-users on behalf of Christian Balzer" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of chibi@xxxxxxx> wrote:

>
>> 11 OSD nodes:
>> -SuperMicro 6047R-E1R36L
>> --2x E5-2603v2
>Vastly underpowered for 36 OSDs.
>> --128GB RAM
>> --36x 6TB OSD
>> --2x Intel P3700 (journals)
>Which exact model?
>If it's the 400GB one, that's 2GB/s maximum write speed combined.
>Slightly below what I'd expect your 36 HDDs to be able to write at about
>2.5GB/s (36*70MB/s), but not unreasonably so.
>However your initial network thoughts are massively overspec'ed for this
>kind of performance.

What I have seen about OSD server sizing is:
- 1GB of RAM per TB of OSD, 36x6TB for replicated pools
- 0.5 core or 1Ghz per OSD disk for replicated pools
- 1 or 2 core for SSDs

Source:
- Minimum hardware recommendations: http://docs.ceph.com/docs/hammer/start/hardware-recommendations/#minimum-hardware-recommendations
- Video (timestamp 12:00): https://www.youtube.com/watch?v=XBfYY-VhzpY
- Slides (slide 20): http://www.slideshare.net/mirantis/ceph-talk-vancouver-20

So you might want to increase the RAM to around 192-256GB and the CPU to something like a dual 10 cores 2 Ghz (or more), E5-2660 v2 for example.

>
>> 
>> 3 MDS nodes:
>> -SuperMicro 1028TP-DTR (one node from scale-out chassis)
>> --2x E5-2630v4
>> --128GB RAM
>> --2x 120GB SSD (RAID 1 for OS)
>Not using CephFS, but if the MDS are like all the other Ceph bits (MONs in
>particular) they are likely to do happy writes to leveldbs or the likes, do
>verify that.
>If that's the case, fast and durable SSDs will be needed.
>
>> 
>> 5 MON nodes:
>> -SuperMicro 1028TP-DTR (one node from scale-out chassis)
>> --2x E5-2630v4
>> --128GB RAM
>> --2x 120GB SSD (RAID 1 for OS)
>> 
>Total overkill, are you sure you didn't mix up the CPUs for the OSDs with
>the ones for the MONs?
>Also, while dedicated MONs are nice, they really can live rather frugally,
>except for the lust for fast, durable storage.
>If I were you, I'd get 2 dedicated MON nodes (with few, fastish cores) and
>32-64GB RAM, then put the other 3 on your MDS nodes which seem to have
>plenty resources to go around.
>You will want the dedicated MONs to have the lowest IPs in your network,
>the monitor leader is chosen by that.
>
>Christian
>> We'd use our existing Zabbix deployment for monitoring and ELK for log
>> aggregation.
>> 
>> Provisioning would be through puppet-razor (PXE) and puppet.
>> 
>> Again, thank you for any information you can provide
>> 
>> --Brady
>
>
>-- 
>Christian Balzer        Network/Systems Engineer                
>chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
>http://www.gol.com/
>_______________________________________________
>ceph-users mailing list
>ceph-users@xxxxxxxxxxxxxx
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Regards,
Maxime G
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com