Re: [SOLVED] Replicated pool with an even size - has min_size to be bigger than half the size?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Den 2018-03-29 kl. 14:26, skrev David Rabel:
On 29.03.2018 13:50, Peter Linder wrote:
Den 2018-03-29 kl. 12:29, skrev David Rabel:
On 29.03.2018 12:25, Janne Johansson wrote:
2018-03-29 11:50 GMT+02:00 David Rabel <rabel@xxxxxxxxxxxxx>:
You are right. But with my above example: If I have min_size 2 and size
4, and because of a network issue the 4 OSDs are split into 2 and 2, is
it possible that I have write operations on both sides and therefore
have inconsistent data?

You always write to the primary, which in turn sends copies to the 3
others,
so in the 2+2 split case, only one side can talk to the primary OSD for
that pg,
so writes will just happen on one side at most.
I'm not sure that this is true, will not the side that doesn't have the
primary simply elect a new one when min_size=2 and there are 2 of
[failure domain] available? This is assuming that there are enough mon's
also.

Even if this is the case, only half of the PGs would be available and
operations will stop.
Why is this? If min_size is 2 and 2 PGs are available, operations should
not stop. Or am I wrong here?
Yes, but there are 2 OSDs available per PG per side of the partition, so 2 separate active clusters. If there is a different write to both of them it will be accepted and it will not be possible to heal the cluster later when the network issue is resolved because of inconsistency.

Even if it was a 50/50 chance on which side a PG would be active (going by the original primary) it would mean trouble as many writes could not complete, but I don't think this is the case.

You will have to take into account  mon quorum as well of course, it is outside of my post. Best think I believe is to have an uneven number of everything. I don't know if you can have 4 OSD hosts and a 5:th node for quorum, I suppose it woule be worth it if the extra quorum node could not fail at the same time as 2 of the hosts.


David




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux