Re: rbd pool:replica size choose: 2 vs 3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 23, 2016 at 9:29 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
>
>
> > Op 23 september 2016 om 9:11 schreef Tomasz Kuzemko <tomasz.kuzemko@xxxxxxxxxxxx>:
> >
> >
> > Hi,
> >
> > biggest issue with replica size 2 is that if you find an inconsistent
> > object you will not be able to tell which copy is the correct one. With
> > replica size 3 you could assume that those 2 copies that are the same
> > are correct.
> >
> > Until Ceph guarantees stored data integrity (that is - until we have
> > production-ready Bluestore), I would not go with replica size 2.
> >
>
> Not only that, but the same could happen if you have flapping OSDs.
>
> OSD 0 and 1 share a PG.
>
> 0 goes down, 1 is up and acting and accept writes. Now 1 goes down and 0 comes up. 0 becomes primary, but the PG is 'down' because 1 had the last data. You really need 1 to come back in this case before the PG will work again.
>
> I have seen this happen multiple times in systems which got overloaded.
>
> If you care about your data you run with size = 3 and min_size = 2.
>
> Wido

FWIW, when Intel presented their reference architectures at Ceph Day
Switzerland, their "IOPS-Optimized" config had 2 replicas on "Intel
SSD DC Series".

I guess they trust their hardware. But personally even if I was forced
to run 2x replicas, I'd try to use size=2, min_size=2.

-- Dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux