Re: Production/Non-production segmentation

Greg Poirier <greg.poirier@xxxxxxxxxx> · Wed, 31 Jul 2013 12:34:37 -0700

On Wed, Jul 31, 2013 at 12:19 PM, Mike Dawson <mike.dawson@xxxxxxxxxxxx> wrote:

Due to the speed of releases in the Ceph project, I feel having separate physical hardware is the safer way to go, especially in light of your mention of an SLA for your production services.

Ah. I guess I should offer a little more background as to what I mean by production vs. non-production: customer-facing, and not.

We're using Ceph primarily for volume storage with OpenStack at the moment and operate two OS clusters: one for all of our customer-facing services (which require a higher SLA) and one for all of our internal services. The idea being that all of the customer-facing stuff is segmented physically from anything our developers might be testing internally.

What I'm wondering:

Does anyone else here do this?
If so, do you run multiple Ceph clusters?
Do you let Ceph sort itself out? 
Can this be done with a single physical cluster, but multiple logical clusters? 
Should it be? 

I know that, mathematically speaking, the larger your Ceph cluster is, the more evenly distributed the load (thanks to CRUSH). I'm wondering if, in practice, RBD can still create hotspots (say from a runaway service with multiple instances and volumes that is suddenly doing a ton of IO). This would increase IO latency across the Ceph cluster, I'd assume, and could impact the performance of customer-facing services.

So, to some degree, physical segmentation makes sense to me. But can we simply reserve some OSDs per physical host for a "production" logical cluster and then use the rest for the "development" logical cluster (separate MON clusters for each, but all running on the same hardware). Or, given a sufficiently large cluster, is this not even a concern?

I'm also interested in hearing about experience using CephFS, Swift, and RBD all on a single cluster or if people have chosen to use multiple clusters for these as well. For example, if you need faster volume storage in RBD, so you go for more spindles and smaller disks vs. larger disks with fewer spindles for object storage, which can have a higher allowance for latency than volume storage.  

A separate non-production cluster will allow you to test and validate new versions (including point releases within a stable series) before you attempt to upgrade your production cluster.

Oh yeah. I'm doing that for sure. 

Thanks,

Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com