Guys, Ceph does not have a concept of "osd quorum" or "electing a primary PG". The mons are in a PAXOS quorum, and the mon leader decides which OSD is primary for each PG. No need to worry about a split OSD brain. -- dan On Thu, Mar 29, 2018 at 2:51 PM, Peter Linder <peter.linder@xxxxxxxxxxxxxx> wrote: > > > Den 2018-03-29 kl. 14:26, skrev David Rabel: > > On 29.03.2018 13:50, Peter Linder wrote: > > Den 2018-03-29 kl. 12:29, skrev David Rabel: > > On 29.03.2018 12:25, Janne Johansson wrote: > > 2018-03-29 11:50 GMT+02:00 David Rabel <rabel@xxxxxxxxxxxxx>: > > You are right. But with my above example: If I have min_size 2 and size > 4, and because of a network issue the 4 OSDs are split into 2 and 2, is > it possible that I have write operations on both sides and therefore > have inconsistent data? > > You always write to the primary, which in turn sends copies to the 3 > others, > so in the 2+2 split case, only one side can talk to the primary OSD for > that pg, > so writes will just happen on one side at most. > > I'm not sure that this is true, will not the side that doesn't have the > primary simply elect a new one when min_size=2 and there are 2 of > [failure domain] available? This is assuming that there are enough mon's > also. > > Even if this is the case, only half of the PGs would be available and > operations will stop. > > Why is this? If min_size is 2 and 2 PGs are available, operations should > not stop. Or am I wrong here? > > Yes, but there are 2 OSDs available per PG per side of the partition, so 2 > separate active clusters. If there is a different write to both of them it > will be accepted and it will not be possible to heal the cluster later when > the network issue is resolved because of inconsistency. > > Even if it was a 50/50 chance on which side a PG would be active (going by > the original primary) it would mean trouble as many writes could not > complete, but I don't think this is the case. > > You will have to take into account mon quorum as well of course, it is > outside of my post. Best think I believe is to have an uneven number of > everything. I don't know if you can have 4 OSD hosts and a 5:th node for > quorum, I suppose it woule be worth it if the extra quorum node could not > fail at the same time as 2 of the hosts. > > > David > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com