On Fri, Sep 23, 2016 at 9:29 AM, Wido den Hollander <wido@xxxxxxxx> wrote: > > > > Op 23 september 2016 om 9:11 schreef Tomasz Kuzemko <tomasz.kuzemko@xxxxxxxxxxxx>: > > > > > > Hi, > > > > biggest issue with replica size 2 is that if you find an inconsistent > > object you will not be able to tell which copy is the correct one. With > > replica size 3 you could assume that those 2 copies that are the same > > are correct. > > > > Until Ceph guarantees stored data integrity (that is - until we have > > production-ready Bluestore), I would not go with replica size 2. > > > > Not only that, but the same could happen if you have flapping OSDs. > > OSD 0 and 1 share a PG. > > 0 goes down, 1 is up and acting and accept writes. Now 1 goes down and 0 comes up. 0 becomes primary, but the PG is 'down' because 1 had the last data. You really need 1 to come back in this case before the PG will work again. > > I have seen this happen multiple times in systems which got overloaded. > > If you care about your data you run with size = 3 and min_size = 2. > > Wido FWIW, when Intel presented their reference architectures at Ceph Day Switzerland, their "IOPS-Optimized" config had 2 replicas on "Intel SSD DC Series". I guess they trust their hardware. But personally even if I was forced to run 2x replicas, I'd try to use size=2, min_size=2. -- Dan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com