Re: Worst thing that can happen if I have size= 2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ceph is multiple factors more risky with min_size 1 than good old raid1:

With raid1, having disks A and B, when disk A fails, you start recovery to
a new disk A'. If disk B fails during recovery then you have a disaster.

With Ceph, we have multiple servers and multiple disks: When an OSD fails
and you replace it, it starts recovering. During that recovery time, if
roughly *any* other disk in the cluster fails then you have a disaster.

That's the basic argument.

In more detail, OSDs are aware of a sort of "last written to" state of the
PGs on all their peers. If an OSD goes down briefly then restarts, it first
learns the PG states of its peers and starts recovering those missed
writes. The recovering OSD will not be able to serve any IO until it has
recovered the objects to their latest states. So... If any of those peers
have any sort of problem during the recovery process, your cluster will be
down. "Down" in this case means precisely that the PG will be marked
incomplete and IO will be blocked until all needed OSDs are up and running.
Experts here know how to revive a cluster in that state, accepting then
dealing with arbitrary data loss, but ceph won't do that "dangerous"
recovery automatically for obvious reasons.

Here's another reference (from Wido again) that i hope will scare you away
from min_size 1:
https://www.slideshare.net/mobile/ShapeBlue/wido-den-hollander-10-ways-to-break-your-ceph-cluster

Lastly, if you can't afford 3x replicas, then use 2+2 erasure coding if
possible.

Cheers, Dan
















On Wed, Feb 3, 2021, 8:49 PM Mario Giammarco <mgiammarco@xxxxxxxxx> wrote:

> Thanks Simon and thanks to other people that have replied.
> Sorry but I try to explain myself better.
> It is evident to me that if I have two copies of data, one brokes and while
> ceph creates again a new copy of the data also the disk with the second
> copy brokes you lose the data.
> It is obvious and a bit paranoid because many servers on many customers run
> on raid1 and so you are saying: yeah you have two copies of the data but
> you can broke both. Consider that in ceph recovery is automatic, with raid1
> some one must manually go to the customer and change disks. So ceph is
> already an improvement in this case even with size=2. With size 3 and min 2
> it is a bigger improvement I know.
>
> What I ask is this: what happens with min_size=1 and split brain, network
> down or similar things: do ceph block writes because it has no quorum on
> monitors? Are there some failure scenarios that I have not considered?
> Thanks again!
> Mario
>
>
>
> Il giorno mer 3 feb 2021 alle ore 17:42 Simon Ironside <
> sironside@xxxxxxxxxxxxx> ha scritto:
>
> > On 03/02/2021 09:24, Mario Giammarco wrote:
> > > Hello,
> > > Imagine this situation:
> > > - 3 servers with ceph
> > > - a pool with size 2 min 1
> > >
> > > I know perfectly the size 3 and min 2 is better.
> > > I would like to know what is the worst thing that can happen:
> >
> > Hi Mario,
> >
> > This thread is worth a read, it's an oldie but a goodie:
> >
> >
> >
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html
> >
> > Especially this post, which helped me understand the importance of
> > min_size=2
> >
> >
> >
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014892.html
> >
> > Cheers,
> > Simon
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux