On Tue, 28 Oct 2014 07:46:30 +0000 Dan Van Der Ster wrote: > > > On 28 Oct 2014, at 08:25, Robert van Leeuwen > > <Robert.vanLeeuwen@xxxxxxxxxxxxx> wrote: > > > >> By now we decide use a SuperMicro's SKU with 72 bays for HDD = 22 SSD > >> + 50 SATA drives. > >> Our racks can hold 10 this servers and 50 this racks in ceph cluster = > >> 36000 OSD's, > >> With 4tb SATA drives and replica = 2 and nerfull ratio = 0.8 we have > >> 40 Petabyte of useful capacity. > >> > >> It's too big or normal use case for ceph? > > > > I'm a bit worried about the replica count: > > The chances of 2 disks failing of 25000 at the same time becomes very > > significant. (or a disk + server failure) Without doing any math my > > gut feeling says that 3 replica's is still not very comfortable. > > (especially if the disks come from the same batch) > > It doesn’t quite work like that. You’re not going to lose data if _any_ > two disks out of 25000 fail. You’ll only lose data if two disks that are > coupled in a PG are lost. So, while there are 25000^2 ways to lose two > disks, there are only nPGs disk pairs that matter for data loss. Said > another way, suppose you have one disk failed, what is the probability > of losing data? Well, the data loss scenario is going to happen if one > of the ~100 disks coupled with the failed disk also fails. So you see, > the chance of data loss with 2 replicas is roughly equivalent whether > you have 1000 OSDs or 25000 OSDs. > We keep having that discussion here and are still lacking a fully realistic model for this scenario. ^^ Though I seem to recall work is being done the Ceph reliability calculator. Lets just say that with a replica of 2 and a set of 100 disks all the models and calculators I checked predict a data loss within a year. That DL probability goes down from 99.99% to just 0.04% in a year (which I would still consider too high) with a replica of 3. That's why I never use more than 22 HDDs in a RAID6 and keep this at 10-12 for anything mission critical. And having likely multiple (even if unrelated) OSD failures at the same time can't be good for recovery times (increased risk) and cluster performance either. Christian > Cheers, Dan > > > > > > Cheers, > > Robert van Leeuwen > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com