Re: Is ceph itself a single point of failure?

Martin Verges <martin.verges@xxxxxxxx> · Mon, 22 Nov 2021 11:48:20 +0100

> In my setup size=2 and min_size=1

just don't.

> Real case: host goes down, individual OSDs from other hosts started
consuming >100GB RAM during backfill and get OOM-killed

configure your cluster in a better way can help

There will never be a single system that redundant that it has 100% uptime.
And as you can see on a regular basis, even big corps like facebook seem to
have some outages of their highly redundant systems. But there is a
difference between data loss and the unavailability to access your data for
a short period. You can design Ceph to be super redundant, to not lose
data, and to run even if one datacenter burns down without a downtime. But
this all come with costs, sometimes quite high costs. Often it's cheaper to
live with a short interruption or to build 2 separated systems than to get
more nines to your availability on a single one.

--
Martin Verges
Managing director

Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

On Mon, 22 Nov 2021 at 11:40, Marius Leustean <marius.leus@xxxxxxxxx> wrote:

> > I do not know what you mean by this, you can tune this with your min size
> and replication. It is hard to believe that exactly harddrives fail in the
> same pg. I wonder if this is not more related to your 'non-default' config?
>
> In my setup size=2 and min_size=1. I had cases when 1 PG being stuck in
> peering state was causing all the VMs in that pool to not get any I/O. My
> setup is really "default", deployed with minimal config changes derived
> from ceph-ansible and with even number of OSDs per host.
>
> > That is also very hard to believe, since I am updating ceph and reboot
> one node at time, which is just going fine.
>
> Real case: host goes down, individual OSDs from other hosts started
> consuming >100GB RAM during backfill and get OOM-killed (but hey,
> documentation says that "provisioning ~8GB per BlueStore OSD is advised.")
>
> > If you would read and investigate, you would not need to ask this
> question.
>
> I was thinking of getting insights on other people's environments, thus
> asking questions :)
>
> > Is your lack of knowledge of ceph maybe a critical issue?
>
> I'm just that poor guy reading and understanding the official documentation
> and lists, but getting hit by the real world ceph.
>
> On Mon, Nov 22, 2021 at 12:23 PM Marc <Marc@xxxxxxxxxxxxxxxxx> wrote:
>
> > >
> > > Many of us deploy ceph as a solution to storage high-availability.
> > >
> > > During the time, I've encountered a couple of moments when ceph refused
> > > to
> > > deliver I/O to VMs even when a tiny part of the PGs were stuck in
> > > non-active states due to challenges on the OSDs.
> >
> > I do not know what you mean by this, you can tune this with your min size
> > and replication. It is hard to believe that exactly harddrives fail in
> the
> > same pg. I wonder if this is not more related to your 'non-default'
> config?
> >
> > > So I found myself in very unpleasant situations when an entire cluster
> > > went
> > > down because of 1 single node, even if that cluster was supposed to be
> > > fault-tolerant.
> >
> > That is also very hard to believe, since I am updating ceph and reboot
> one
> > node at time, which is just going fine.
> >
> > >
> > > Regardless of the reason, the cluster itself can be a single point of
> > > failure, even if it's has a lot of nodes.
> >
> > Indeed, like the data center, and like the planet. The question you
> should
> > ask yourself, do you have a better alternative? For the 3-4 years I have
> > been using ceph, I did not find a better alternative (also not looking
> for
> > it ;))
> >
> > > How do you segment your deployments so that your business doesn't
> > > get jeopardised in the case when your ceph cluster misbehaves?
> > >
> > > Does anyone even use ceph for a very large clusters, or do you prefer
> to
> > > separate everything into smaller clusters?
> >
> > If you would read and investigate, you would not need to ask this
> > question.
> > Is your lack of knowledge of ceph maybe a critical issue? I know the ceph
> > organization likes to make everything as simple as possible for everyone.
> > But this has of course its flip side when users run into serious issues.
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx