Re: general ceph cluster design

Maxime Guyot <Maxime.Guyot@xxxxxxxxx> · Fri, 25 Nov 2016 21:48:35 +0000

Hi Nick,

See inline comments.

Cheers,
Maxime 

On 25/11/16 16:01, "ceph-users on behalf of nick" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of nick@xxxxxxx> wrote:

>    Hi,
>    we are currently planning a new ceph cluster which will be used for 
>    virtualization (providing RBD storage for KVM machines) and we have some 
>    general questions.
>    
>    * Is it advisable to have one ceph cluster spread over multiple datacenters 
>    (latency is low, as they are not so far from each other)? Is anybody doing 
>    this in a production setup? We know that any network issue would affect virtual 
>    machines in all locations instead just one, but we can see a lot of advantages 
>    as well.

I think the general consensus is to limit the size of the failure domain. That said, it depends the use case and what you mean by “multiple datacenters” and “latency is low”: writes will have to be journal-ACK:ed  by the OSDs in the other datacenter. If there is 10ms latency between Location1 and Location2, then it would add 10ms to each write operation if crushmap requires replicas in each location. Speaking of which a 3rd location would help with sorting our quorum (1 mon at each location) in “triangle” configuration.

If this is for DR: RBD-mirroring is supposed to address that, you might not want to have 1 big cluster ( = failure domain).
If this is for VM live migration: Usually requires spread L2 adjacency (failure domain) or overlays (VXLAN and the likes), “network trombone” effect can be a problem depending on the setup

I know of Nantes University who used/is using a 3 datacenter Ceph cluster:  http://dachary.org/?p=2087 

>    
>    * We are planning to combine the hosts for ceph and KVM (so far we are using 
>    seperate hosts for virtual machines and ceph storage). We see the big 
>    advantage (next to the price drop) of an automatic ceph expansion when adding 
>    more compute nodes as we got into situations in the past where we had too many 
>    compute nodes and the ceph cluster was not expanded properly (performance 
>    dropped over time). On the other side there would be changes to the crush map 
>    every time we add a compute node and that might end in a lot of data movement 
>    in ceph. Is anybody using combined servers for compute and ceph storage and 
>    has some experience?

The challenge is to avoid ceph-osd to become a noisy neighbor for the VMs hosted on the hypervisor, especially under recovery. I’ve heard people using CPU pinning, containers, and QoS to keep it under control.
Sebastian has an article on his blog this topic: https://www.sebastien-han.fr/blog/2016/07/11/Quick-dive-into-hyperconverged-architecture-with-OpenStack-and-Ceph/ 

For the performance dropped over time, you can look to improve your capacity:performance ratio.

>    * is there a maximum amount of OSDs in a ceph cluster? We are planning to use 
>    a minimum of 8 OSDs per server and going to have a cluster with about 100 
>    servers which would end in about 800 OSDs.

There are a couple of thread from the ML about this: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/028371.html  and http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-November/014246.html 

>    
>    Thanks for any help...
>    
>    Cheers
>    Nick

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com