Is ceph itself a single point of failure?

Marius Leustean <marius.leus@xxxxxxxxx> · Mon, 22 Nov 2021 11:54:25 +0200

Many of us deploy ceph as a solution to storage high-availability.

During the time, I've encountered a couple of moments when ceph refused to
deliver I/O to VMs even when a tiny part of the PGs were stuck in
non-active states due to challenges on the OSDs.
So I found myself in very unpleasant situations when an entire cluster went
down because of 1 single node, even if that cluster was supposed to be
fault-tolerant.

Regardless of the reason, the cluster itself can be a single point of
failure, even if it's has a lot of nodes.

How do you segment your deployments so that your business doesn't
get jeopardised in the case when your ceph cluster misbehaves?

Does anyone even use ceph for a very large clusters, or do you prefer to
separate everything into smaller clusters?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx