Re: why was osd pool default size changed from 2 to 3.

Christian Balzer <chibi@xxxxxxx> · Sat, 24 Oct 2015 23:35:00 +0900

Hello,

There have been COUNTLESS discussions about Ceph reliability, fault
tolerance and so forth in this very ML. 
Google is very much evil, but in this case it is your friend. 

In those threads you will find several reliability calculators, some more
flawed than others, but penultimately you do not use a replica of 2 for
the same reasons people don't use RAID5 for anything valuable.

A replication of 2 MAY be fine with very reliable, fast and not too large
SSDs, but that's about it. 
Spinning rust is never safe with just one copy.

Christian

On Sat, 24 Oct 2015 09:41:35 +0200 Stefan Eriksson wrote:

> > Am 23.10.2015 um 20:53 schrieb Gregory Farnum:
> >> On Fri, Oct 23, 2015 at 8:17 AM, Stefan Eriksson <stefan@xxxxxxxxxxx>
> wrote:
> >>
> >> Nothing changed to make two copies less secure. 3 copies is just so
> >> much more secure and is the number that all the companies providing
> >> support recommend, so we changed the default.
> >> (If you're using it for data you care about, you should really use 3
> copies!)
> >> -Greg
> >
> > I assume that number really depends on the (number of) OSDs you have in
> your crush rule for that pool. A replication of
> > 2 might be ok for a pool spread over 10 osds, but not for one spread
> > over
> 100 osds....
> >
> > Corin
> >
> 
> I'm also interested in this, what changes when you add 100+ OSDs (to
> warrant 3 replicas instead of 2), and the reasoning as to why "the
> companies providing support recommend 3." ?
> Theoretically it seems secure to have two replicas.
> If you have 100+ OSDs, I can see that maintenance will take much longer,
> and if you use "set noout" then a single PG will be active when the other
> replica is under maintenance.
> But if you "crush reweight to 0" before the maintenance this would not be
> an issue.
> Is this the main reason?
> 
> From what I can gather even if you add new OSDs to the cluster and the
> balancing kicks in, it still maintains its two replicas.
> 
> thanks.

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com