Re: calculating maximum number of disk and node failure that can be handled by cluster with out data loss

Christian Balzer <chibi@xxxxxxx> · Wed, 10 Jun 2015 17:16:11 +0900

Hello,

As always, this has been discussed in the past, with people taking various
bits of "truth" from it and a precise model for failure, the latest one is
here: https://wiki.ceph.com/Development/Reliability_model/Final_report

And there is this, last time it came up I felt it didn't take all things
into account (especially realistic times to full recovery): 
https://github.com/ceph/ceph-tools/tree/master/models/reliability

The more disks you will have, the more likely a triple failure becomes.
OTOH as Dan pointed out, the chance of them sharing PGs goes down.

In the OPs example of 4 nodes with 4 disks each, having 3 disks fails on 3
different nodes at the same time will certainly loose data.
In a cluster of 100 nodes with 12 disks each that certainty becomes a
probability. How much of one, I'll leave to people who feel comfortable
with those levels of math.

Note that I consider a node failure something that might stop my Ceph
cluster from working (if it were to drop below min_size), but doesn't
result in data loss. 

Christian

On Wed, 10 Jun 2015 09:55:22 +0200 Dan van der Ster wrote:

> This is a CRUSH misconception. Triple drive failures only cause data
> loss when they share a PG (e.g. ceph pg dump .. those [x,y,z] triples
> of OSDs are the only ones that matter). If you have very few OSDs,
> then its possibly true that any combination of disks would lead to
> failure. But as you increase the number of OSDs, the likelihood of
> triple sharing a PG decreases (even though the number of 3-way
> combinations increases).
> 
> Cheers, Dan
> 
> On Wed, Jun 10, 2015 at 8:47 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> > Hidden danger in the default CRUSH rules is that if you lose 3 drives
> > in 3 different hosts at the same time, you _will_ lose data, and not
> > just some data but possibly a piece of every rbd volume you have...
> > And the probability of that happening is sadly nowhere near zero. We
> > had drives drop out of cluster under load, which of course comes when
> > a drive fails, then another fails, then another fails… not pretty.
> >
> > Jan
> >
> >> On 09 Jun 2015, at 18:11, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> >>
> >> Signed PGP part
> >> If you are using the default rule set (which I think has min_size 2),
> >> you can sustain 1-4 disk failures or one host failures.
> >>
> >> The reason disk failures vary so wildly is that you can lose all the
> >> disks in host.
> >>
> >> You can lose up to another 4 disks (in the same host) or 1 host
> >> without data loss, but I/O will block until Ceph can replicate at
> >> least one more copy (assuming the min_size 2 stated above).
> >> ----------------
> >> Robert LeBlanc
> >> GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>
> >>
> >> On Tue, Jun 9, 2015 at 9:53 AM, kevin parrikar  wrote:
> >> > I have 4 node cluster each with 5 disks (4 OSD and 1 Operating
> >> > system also hosting 3 monitoring process) with default replica 3.
> >> >
> >> > Total OSD disks : 16
> >> > Total Nodes : 4
> >> >
> >> > How can i calculate the
> >> >
> >> > Maximum number of disk failures my cluster can handle with out  any
> >> > impact on current data and new writes.
> >> > Maximum number of node failures  my cluster can handle with out any
> >> > impact on current data and new writes.
> >> >
> >> > Thanks for any help
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@xxxxxxxxxxxxxx
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com