No, there is no split brain problem even with size/mine_size 2/1. A PG will not go active if it doesn't have the latest data because all other OSDs that might have seen writes are currently offline.
That's what the history_ignore_les_bounds option effectively does: it tells ceph to take a PG active anyways in that situation.
That's why you end up with inactive PGs if you run 2/1 and a disk dies while OSDs flap. You then have to set history_ignore_les_bounds if the dead disk is really unrecoverable, losing the latest modifcations to an object.
But Ceph will not compromise your data without you manually telling it to do so, it will just block IO instead.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon, May 20, 2019 at 10:04 PM Frank Schilder <frans@xxxxxx> wrote:
If min_size=1 and you loose the last disk, that's end of any data that was only on this disk.
Apart from this, using size=2 and min_size=1 is a really bad idea. This has nothing to do with data replication but rather with an inherent problem with high availability and the number 2. You need at least 3 members of an HA group to ensure stable operation with proper majorities. There are numerous stories about OSD flapping caused by size-2 min_size-1 pools, leading to situations that are extremely hard to recover from. My favourite is this one: https://blog.noc.grnet.gr/2016/10/18/surviving-a-ceph-cluster-outage-the-hard-way/ . You will easily find more. The deeper problem here is called "split-brain" and there is no real solution to it except to avoid it at all cost.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Florent B <florent@xxxxxxxxxxx>
Sent: 20 May 2019 21:33
To: Paul Emmerich; Frank Schilder
Cc: ceph-users
Subject: Re: Default min_size value for EC pools
I understand better thanks to Frank & Paul messages.
Paul, when min_size=k, is it the same problem with replicated pool size=2 & min_size=1 ?
On 20/05/2019 21:23, Paul Emmerich wrote:
Yeah, the current situation with recovery and min_size is... unfortunate :(
The reason why min_size = k is bad is just that it means you are accepting writes without guaranteeing durability while you are in a degraded state.
A durable storage system should never tell a client "okay, i've written your data" if losing a single disk leads to data loss.
Yes, that is the default behavior of traditional raid 5 and raid 6 systems during rebuild (with 1 or 2 disk failures for raid 5/6), but that doesn't mean it's a good idea.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io<http://www.croit.io>
Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com