Re: general ceph cluster design

Benjeman Meekhof <bmeekhof@xxxxxxxxx> · Mon, 28 Nov 2016 12:23:07 -0700

Hi Nick,

We have a Ceph cluster spread across 3 datacenters at 3 institutions
in Michigan (UM, MSU, WSU).  It certainly is possible.  As noted you
will have increased latency for write operations and overall reduced
throughput as latency increases.  Latency between our sites is 3-5ms.

We did some simulated latency testing with netem where we induced
varying levels of latency on one of our storage hosts (60 OSD).  Some
information about the results is on our website:
http://www.osris.org/performance/latency

We also had success running a 4th cluster site at Supercomputing in
SLC.  We'll be putting up information on experiences there in the near
future.

thanks,
Ben

On Mon, Nov 28, 2016 at 12:06 AM, nick <nick@xxxxxxx> wrote:
> Hi Maxime,
> thank you for the information given. We will have a look and check.
>
> Cheers
> Nick
>
> On Friday, November 25, 2016 09:48:35 PM Maxime Guyot wrote:
>> Hi Nick,
>>
>> See inline comments.
>>
>> Cheers,
>> Maxime
>>
>> On 25/11/16 16:01, "ceph-users on behalf of nick"
>> <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of nick@xxxxxxx> wrote:
>
>>
>> >    Hi,
>> >    we are currently planning a new ceph cluster which will be used for
>> >    virtualization (providing RBD storage for KVM machines) and we have
>> >    some
>> >    general questions.
>> >
>> >    * Is it advisable to have one ceph cluster spread over multiple
>> >    datacenters
>  (latency is low, as they are not so far from each
>> >    other)? Is anybody doing this in a production setup? We know that any
>> >    network issue would affect virtual machines in all locations instead
>> >    just one, but we can see a lot of advantages as well.
>>
>>
>> I think the general consensus is to limit the size of the failure domain.
>> That said, it depends the use case and what you mean by “multiple
>> datacenters” and “latency is low”: writes will have to be journal-ACK:ed
>> by the OSDs in the other datacenter. If there is 10ms latency between
>> Location1 and Location2, then it would add 10ms to each write operation if
>> crushmap requires replicas in each location. Speaking of which a 3rd
>> location would help with sorting our quorum (1 mon at each location) in
>> “triangle” configuration.
>
>> If this is for DR: RBD-mirroring is supposed to address that, you might not
>> want to have 1 big cluster ( = failure domain).
>  If this is for VM live
>> migration: Usually requires spread L2 adjacency (failure domain) or
>> overlays (VXLAN and the likes), “network trombone” effect can be a problem
>> depending on the setup
>> I know of Nantes University who used/is using a 3 datacenter Ceph cluster:
>> http://dachary.org/?p=2087
>
>>
>> >
>> >    * We are planning to combine the hosts for ceph and KVM (so far we are
>> >    using
>  seperate hosts for virtual machines and ceph storage). We see
>> >    the big advantage (next to the price drop) of an automatic ceph
>> >    expansion when adding more compute nodes as we got into situations in
>> >    the past where we had too many compute nodes and the ceph cluster was
>> >    not expanded properly (performance dropped over time). On the other
>> >    side there would be changes to the crush map every time we add a
>> >    compute node and that might end in a lot of data movement in ceph. Is
>> >    anybody using combined servers for compute and ceph storage and has
>> >    some experience?
>>
>>
>> The challenge is to avoid ceph-osd to become a noisy neighbor for the VMs
>> hosted on the hypervisor, especially under recovery. I’ve heard people
>> using CPU pinning, containers, and QoS to keep it under control.
>  Sebastian
>> has an article on his blog this topic:
>> https://www.sebastien-han.fr/blog/2016/07/11/Quick-dive-into-hyperconverged
>> -architecture-with-OpenStack-and-Ceph/
>> For the performance dropped over time, you can look to improve your
>> capacity:performance ratio.
>
>>
>> >    * is there a maximum amount of OSDs in a ceph cluster? We are planning
>> >    to use
>  a minimum of 8 OSDs per server and going to have a cluster
>> >    with about 100 servers which would end in about 800 OSDs.
>>
>>
>> There are a couple of thread from the ML about this:
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/028371.html
>> and
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-November/014246.ht
>> ml
>
>>
>> >
>> >    Thanks for any help...
>> >
>> >    Cheers
>> >    Nick
>>
>>
>
>
> --
> Sebastian Nickel
> Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
> Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com