Re: Is ceph itself a single point of failure?

Janne Johansson <icepic.dz@xxxxxxxxx> · Mon, 22 Nov 2021 11:47:22 +0100

Den mån 22 nov. 2021 kl 11:40 skrev Marius Leustean <marius.leus@xxxxxxxxx>:
> > I do not know what you mean by this, you can tune this with your min size
> and replication. It is hard to believe that exactly harddrives fail in the
> same pg. I wonder if this is not more related to your 'non-default' config?
>
> In my setup size=2 and min_size=1. I had cases when 1 PG being stuck in
> peering state was causing all the VMs in that pool to not get any I/O. My
> setup is really "default", deployed with minimal config changes derived
> from ceph-ansible and with even number of OSDs per host.

nono, default is repl=3, min_size=2 for the very reason that you need to be able
to continue when one OSD is down. You set yourself into this position
by reducing
the safety and ceph reacted by stopping the writes rather than
allowing you to lose
data.

If you were afraid of losing access, you should have tuned it in the
other direction
instead, repl=4,5 and min_size 2,3 at that, so you could lose two
drives and still recover/continue.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx