Many of us deploy ceph as a solution to storage high-availability. During the time, I've encountered a couple of moments when ceph refused to deliver I/O to VMs even when a tiny part of the PGs were stuck in non-active states due to challenges on the OSDs. So I found myself in very unpleasant situations when an entire cluster went down because of 1 single node, even if that cluster was supposed to be fault-tolerant. Regardless of the reason, the cluster itself can be a single point of failure, even if it's has a lot of nodes. How do you segment your deployments so that your business doesn't get jeopardised in the case when your ceph cluster misbehaves? Does anyone even use ceph for a very large clusters, or do you prefer to separate everything into smaller clusters? _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx