Re: general ceph cluster design

nick <nick@xxxxxxx> · Tue, 29 Nov 2016 07:38:03 +0100

Hi Ben,
thanks for the information as well. It looks like we first will do some latency 
tests between our data centers (thanks for the netem hint), before deciding 
which topology is best for us. For simple DR scenarios rbd mirroring sounds 
like the better solution so far.
We are still fans of the hyperconverged setup (compute + ceph on one node) and 
are searching for suitable hardware. I think the resource usage separation 
should be doable with CPU pinning and plain old cgroups.

Cheers
Nick

On Monday, November 28, 2016 12:23:07 PM Benjeman Meekhof wrote:
> Hi Nick,
> 
> We have a Ceph cluster spread across 3 datacenters at 3 institutions
> in Michigan (UM, MSU, WSU).  It certainly is possible.  As noted you
> will have increased latency for write operations and overall reduced
> throughput as latency increases.  Latency between our sites is 3-5ms.
> 
> We did some simulated latency testing with netem where we induced
> varying levels of latency on one of our storage hosts (60 OSD).  Some
> information about the results is on our website:
> http://www.osris.org/performance/latency
> 
> We also had success running a 4th cluster site at Supercomputing in
> SLC.  We'll be putting up information on experiences there in the near
> future.
> 
> thanks,
> Ben
> 
> On Mon, Nov 28, 2016 at 12:06 AM, nick <nick@xxxxxxx> wrote:
> > Hi Maxime,
> > thank you for the information given. We will have a look and check.
> > 
> > Cheers
> > Nick
> > 
> > On Friday, November 25, 2016 09:48:35 PM Maxime Guyot wrote:
> >> Hi Nick,
> >> 
> >> See inline comments.
> >> 
> >> Cheers,
> >> Maxime
> >> 
> >> On 25/11/16 16:01, "ceph-users on behalf of nick"
> >> 
> >> <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of nick@xxxxxxx> wrote:
> >> >    Hi,
> >> >    we are currently planning a new ceph cluster which will be used for
> >> >    virtualization (providing RBD storage for KVM machines) and we have
> >> >    some
> >> >    general questions.
> >> >    
> >> >    * Is it advisable to have one ceph cluster spread over multiple
> >> >    datacenters
> >  
> >  (latency is low, as they are not so far from each
> >  
> >> >    other)? Is anybody doing this in a production setup? We know that
> >> >    any
> >> >    network issue would affect virtual machines in all locations instead
> >> >    just one, but we can see a lot of advantages as well.
> >> 
> >> I think the general consensus is to limit the size of the failure domain.
> >> That said, it depends the use case and what you mean by “multiple
> >> datacenters” and “latency is low”: writes will have to be journal-ACK:ed
> >> by the OSDs in the other datacenter. If there is 10ms latency between
> >> Location1 and Location2, then it would add 10ms to each write operation
> >> if
> >> crushmap requires replicas in each location. Speaking of which a 3rd
> >> location would help with sorting our quorum (1 mon at each location) in
> >> “triangle” configuration.
> >> 
> >> If this is for DR: RBD-mirroring is supposed to address that, you might
> >> not
> >> want to have 1 big cluster ( = failure domain).
> >> 
> >  If this is for VM live
> >  
> >> migration: Usually requires spread L2 adjacency (failure domain) or
> >> overlays (VXLAN and the likes), “network trombone” effect can be a
> >> problem
> >> depending on the setup
> >> I know of Nantes University who used/is using a 3 datacenter Ceph
> >> cluster:
> >> http://dachary.org/?p=2087
> >> 
> >> >    * We are planning to combine the hosts for ceph and KVM (so far we
> >> >    are
> >> >    using
> >  
> >  seperate hosts for virtual machines and ceph storage). We see
> >  
> >> >    the big advantage (next to the price drop) of an automatic ceph
> >> >    expansion when adding more compute nodes as we got into situations
> >> >    in
> >> >    the past where we had too many compute nodes and the ceph cluster
> >> >    was
> >> >    not expanded properly (performance dropped over time). On the other
> >> >    side there would be changes to the crush map every time we add a
> >> >    compute node and that might end in a lot of data movement in ceph.
> >> >    Is
> >> >    anybody using combined servers for compute and ceph storage and has
> >> >    some experience?
> >> 
> >> The challenge is to avoid ceph-osd to become a noisy neighbor for the VMs
> >> hosted on the hypervisor, especially under recovery. I’ve heard people
> >> using CPU pinning, containers, and QoS to keep it under control.
> >> 
> >  Sebastian
> >  
> >> has an article on his blog this topic:
> >> https://www.sebastien-han.fr/blog/2016/07/11/Quick-dive-into-hyperconverg
> >> ed
> >> -architecture-with-OpenStack-and-Ceph/
> >> For the performance dropped over time, you can look to improve your
> >> capacity:performance ratio.
> >> 
> >> >    * is there a maximum amount of OSDs in a ceph cluster? We are
> >> >    planning
> >> >    to use
> >  
> >  a minimum of 8 OSDs per server and going to have a cluster
> >  
> >> >    with about 100 servers which would end in about 800 OSDs.
> >> 
> >> There are a couple of thread from the ML about this:
> >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/028371.htm
> >> l
> >> and
> >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-November/014246.
> >> ht
> >> ml
> >> 
> >> >    Thanks for any help...
> >> >    
> >> >    Cheers
> >> >    Nick
> > 
> > --
> > Sebastian Nickel
> > Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
> > Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Sebastian Nickel
Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch
Attachment:
signature.asc

Description: This is a digitally signed message part.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com