Re: What a maximum theoretical and practical capacity in ceph cluster?

Christian Balzer <chibi@xxxxxxx> · Tue, 28 Oct 2014 17:30:34 +0900

On Tue, 28 Oct 2014 07:46:30 +0000 Dan Van Der Ster wrote:

> 
> > On 28 Oct 2014, at 08:25, Robert van Leeuwen
> > <Robert.vanLeeuwen@xxxxxxxxxxxxx> wrote:
> > 
> >> By now we decide use a SuperMicro's SKU with 72 bays for HDD = 22 SSD
> >> + 50 SATA drives.
> >> Our racks can hold 10 this servers and 50 this racks in ceph cluster =
> >> 36000 OSD's,
> >> With 4tb SATA drives and replica = 2 and nerfull ratio = 0.8 we have
> >> 40 Petabyte of useful capacity.
> >> 
> >> It's too big or normal use case for ceph?
> > 
> > I'm a bit worried about the replica count:
> > The chances of 2 disks failing of 25000 at the same time becomes very
> > significant.  (or a disk + server failure) Without doing any math my
> > gut feeling says that 3 replica's is still not very comfortable.
> > (especially if the disks come from the same batch)
> 
> It doesn’t quite work like that. You’re not going to lose data if _any_
> two disks out of 25000 fail. You’ll only lose data if two disks that are
> coupled in a PG are lost. So, while there are 25000^2 ways to lose two
> disks, there are only nPGs disk pairs that matter for data loss. Said
> another way, suppose you have one disk failed, what is the probability
> of losing data? Well, the data loss scenario is going to happen if one
> of the ~100 disks coupled with the failed disk also fails. So you see,
> the chance of data loss with 2 replicas is roughly equivalent whether
> you have 1000 OSDs or 25000 OSDs.
> 

We keep having that discussion here and are still lacking a fully realistic
model for this scenario. ^^ 
Though I seem to recall work is being done the Ceph reliability
calculator. 

Lets just say that with a replica of 2 and a set of 100 disks all the
models and calculators I checked predict a data loss within a year.
That DL probability goes down from 99.99% to just 0.04% in a year (which I
would still consider too high) with a replica of 3.
That's why I never use more than 22 HDDs in a RAID6 and keep this at 10-12
for anything mission critical.

And having likely multiple (even if unrelated) OSD failures at the same
time can't be good for recovery times (increased risk) and cluster
performance either.

Christian

> Cheers, Dan
> 
> 
> > 
> > Cheers,
> > Robert van Leeuwen
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com