Re: Failure probability with largish deployments

Christian Balzer <chibi@xxxxxxx> · Sun, 22 Dec 2013 23:03:15 +0900

Hello Kyle,

On Fri, 20 Dec 2013 13:37:18 -0800 Kyle Bader wrote:

> Using your data as inputs to in the Ceph reliability calculator [1]
> results in the following:
> 
I shall have to (literally, as in GIT) check that out next week...

However before that, some questions to help me understand what we are
measuring here and how.

For starters, I do really have a hard time figuring out what an "object" in
Ceph terminology is and I read the Architecture section of the
documentation page at list twice, along with many other resources.

Is an object a CephFS file or a RBD image or is it the 4MB blob on the
actual OSD FS?

In my case, I'm only looking at RBD images for KVM volume storage, even
given the default striping configuration I would assume that those 12500
OSD objects for a 50GB image  would not be in the same PG and thus just on
3 (with 3 replicas set) OSDs total?

More questions inline below:
> Disk Modeling Parameters
>     size:           3TiB
>     FIT rate:        826 (MTBF = 138.1 years)
>     NRE rate:    1.0E-16
> RAID parameters
>     replace:           6 hours
>     recovery rate:  500MiB/s (100 minutes)
>     NRE model:              fail
>     object size:            4MiB
> 
> Column legends
> 1 storage unit/configuration being modeled
> 2 probability of object survival (per 1 years)
> 3 probability of loss due to site failures (per 1 years)
> 4 probability of loss due to drive failures (per 1 years)
> 5 probability of loss due to NREs during recovery (per 1 years)
> 6 probability of loss due to replication failure (per 1 years)
> 7 expected data loss per Petabyte (per 1 years)
> 
>     storage               durability    PL(site)  PL(copies)
> PL(NRE)     PL(rep)    loss/PiB
>     ----------            ----------  ----------  ----------
> ----------  ----------  ----------
>     RAID-6: 9+2              6-nines   0.000e+00   2.763e-10
> 0.000011%   0.000e+00   9.317e+07
> 
> 

What amount of disks (OSDs) did you punch in for the following run?
> Disk Modeling Parameters
>     size:           3TiB
>     FIT rate:        826 (MTBF = 138.1 years)
>     NRE rate:    1.0E-16
> RADOS parameters
>     auto mark-out:     10 minutes
>     recovery rate:    50MiB/s (40 seconds/drive)
Blink???
I guess that goes back to the number of disks, but to restore 2.25GB at
50MB/s with 40 seconds per drive...

>     osd fullness:      75%
>     declustering:    1100 PG/OSD
>     NRE model:              fail
>     object size:      4MB
>     stripe length:   1100
I take it that is to mean that any RBD volume of sufficient size is indeed
spread over all disks?

> 
> Column legends
> 1 storage unit/configuration being modeled
> 2 probability of object survival (per 1 years)
> 3 probability of loss due to site failures (per 1 years)
> 4 probability of loss due to drive failures (per 1 years)
> 5 probability of loss due to NREs during recovery (per 1 years)
> 6 probability of loss due to replication failure (per 1 years)
> 7 expected data loss per Petabyte (per 1 years)
> 
>     storage               durability    PL(site)  PL(copies)
> PL(NRE)     PL(rep)    loss/PiB
>     ----------            ----------  ----------  ----------
> ----------  ----------  ----------
>     RADOS: 3 cp             10-nines   0.000e+00   5.232e-08
> 0.000116%   0.000e+00   6.486e+03
> 
> [1] https://github.com/ceph/ceph-tools/tree/master/models/reliability
> 

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com