Ceph is multiple factors more risky with min_size 1 than good old raid1: With raid1, having disks A and B, when disk A fails, you start recovery to a new disk A'. If disk B fails during recovery then you have a disaster. With Ceph, we have multiple servers and multiple disks: When an OSD fails and you replace it, it starts recovering. During that recovery time, if roughly *any* other disk in the cluster fails then you have a disaster. That's the basic argument. In more detail, OSDs are aware of a sort of "last written to" state of the PGs on all their peers. If an OSD goes down briefly then restarts, it first learns the PG states of its peers and starts recovering those missed writes. The recovering OSD will not be able to serve any IO until it has recovered the objects to their latest states. So... If any of those peers have any sort of problem during the recovery process, your cluster will be down. "Down" in this case means precisely that the PG will be marked incomplete and IO will be blocked until all needed OSDs are up and running. Experts here know how to revive a cluster in that state, accepting then dealing with arbitrary data loss, but ceph won't do that "dangerous" recovery automatically for obvious reasons. Here's another reference (from Wido again) that i hope will scare you away from min_size 1: https://www.slideshare.net/mobile/ShapeBlue/wido-den-hollander-10-ways-to-break-your-ceph-cluster Lastly, if you can't afford 3x replicas, then use 2+2 erasure coding if possible. Cheers, Dan On Wed, Feb 3, 2021, 8:49 PM Mario Giammarco <mgiammarco@xxxxxxxxx> wrote: > Thanks Simon and thanks to other people that have replied. > Sorry but I try to explain myself better. > It is evident to me that if I have two copies of data, one brokes and while > ceph creates again a new copy of the data also the disk with the second > copy brokes you lose the data. > It is obvious and a bit paranoid because many servers on many customers run > on raid1 and so you are saying: yeah you have two copies of the data but > you can broke both. Consider that in ceph recovery is automatic, with raid1 > some one must manually go to the customer and change disks. So ceph is > already an improvement in this case even with size=2. With size 3 and min 2 > it is a bigger improvement I know. > > What I ask is this: what happens with min_size=1 and split brain, network > down or similar things: do ceph block writes because it has no quorum on > monitors? Are there some failure scenarios that I have not considered? > Thanks again! > Mario > > > > Il giorno mer 3 feb 2021 alle ore 17:42 Simon Ironside < > sironside@xxxxxxxxxxxxx> ha scritto: > > > On 03/02/2021 09:24, Mario Giammarco wrote: > > > Hello, > > > Imagine this situation: > > > - 3 servers with ceph > > > - a pool with size 2 min 1 > > > > > > I know perfectly the size 3 and min 2 is better. > > > I would like to know what is the worst thing that can happen: > > > > Hi Mario, > > > > This thread is worth a read, it's an oldie but a goodie: > > > > > > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014846.html > > > > Especially this post, which helped me understand the importance of > > min_size=2 > > > > > > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014892.html > > > > Cheers, > > Simon > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx