Re: Production/Non-production segmentation

Mike Dawson <mike.dawson@xxxxxxxxxxxx> · Wed, 31 Jul 2013 15:53:34 -0400

On 7/31/2013 3:34 PM, Greg Poirier wrote:
On Wed, Jul 31, 2013 at 12:19 PM, Mike Dawson <mike.dawson@xxxxxxxxxxxx
<mailto:mike.dawson@xxxxxxxxxxxx>> wrote:

    Due to the speed of releases in the Ceph project, I feel having
    separate physical hardware is the safer way to go, especially in
    light of your mention of an SLA for your production services.

Ah. I guess I should offer a little more background as to what I mean by
production vs. non-production: customer-facing, and not.

That makes more sense.

We're using Ceph primarily for volume storage with OpenStack at the
moment and operate two OS clusters: one for all of our customer-facing
services (which require a higher SLA) and one for all of our internal
services. The idea being that all of the customer-facing stuff is
segmented physically from anything our developers might be testing
internally.

What I'm wondering:

Does anyone else here do this?

Have you looked at Ceph Pools? I think you may find they address many of 
your concerns while maintaining a single cluster.

If so, do you run multiple Ceph clusters?
Do you let Ceph sort itself out?
Can this be done with a single physical cluster, but multiple logical
clusters?
Should it be?

I know that, mathematically speaking, the larger your Ceph cluster is,
the more evenly distributed the load (thanks to CRUSH). I'm wondering
if, in practice, RBD can still create hotspots (say from a runaway
service with multiple instances and volumes that is suddenly doing a ton
of IO). This would increase IO latency across the Ceph cluster, I'd
assume, and could impact the performance of customer-facing services.

So, to some degree, physical segmentation makes sense to me. But can we
simply reserve some OSDs per physical host for a "production" logical
cluster and then use the rest for the "development" logical cluster
(separate MON clusters for each, but all running on the same hardware).
Or, given a sufficiently large cluster, is this not even a concern?

I'm also interested in hearing about experience using CephFS, Swift, and
RBD all on a single cluster or if people have chosen to use multiple
clusters for these as well. For example, if you need faster volume
storage in RBD, so you go for more spindles and smaller disks vs. larger
disks with fewer spindles for object storage, which can have a higher
allowance for latency than volume storage.

See the response from Greg F. from Inktank to a similar question:

http://comments.gmane.org/gmane.comp.file-systems.ceph.user/2090

    A separate non-production cluster will allow you to test and
    validate new versions (including point releases within a stable
    series) before you attempt to upgrade your production cluster.

Oh yeah. I'm doing that for sure.
Thanks,

Greg

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com