Re: Question about reliability model result

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 28 Aug 2015 11:26:24 +0100



I haven't looked at the internals of the model, but the PL(site)
you've pointed out is definitely the crux of the issue here. In the
first grouping, it's just looking at the probability of data loss due
to failing disks, and as the copies increase that goes down. In the
second grouping, it's including other factors like the entire data
center getting knocked out. That possibility is greater than losing
data due to three disk failures here, so it's capping the total data
durability.
-Greg

On Sat, Aug 22, 2015 at 2:38 AM, dahan <dahanhsi@xxxxxxxxx> wrote:
> Hi,
> I have crosspost this issue here and in github,
> but no response yet.
>
> Any advice?
>
> On Mon, Aug 10, 2015 at 10:21 AM, dahan <dahanhsi@xxxxxxxxx> wrote:
>>
>>
>> Hi all, I have tried the reliability model:
>> https://github.com/ceph/ceph-tools/tree/master/models/reliability
>>
>> I run the tool with default configuration, and cannot understand the
>> result.
>>
>> ```
>>     storage               durability    PL(site)  PL(copies)     PL(NRE)
>> PL(rep)    loss/PiB
>>     ----------            ----------  ----------  ----------  ----------
>> ----------  ----------
>>     Disk: Enterprise         99.119%   0.000e+00   0.721457%   0.159744%
>> 0.000e+00   8.812e+12
>>     RADOS: 1 cp              99.279%   0.000e+00   0.721457%   0.000865%
>> 0.000e+00   5.411e+12
>>     RADOS: 2 cp              7-nines   0.000e+00   0.000049%   0.003442%
>> 0.000e+00   9.704e+06
>>     RADOS: 3 cp             11-nines   0.000e+00   5.090e-11   3.541e-09
>> 0.000e+00   6.655e+02
>> ```
>>
>> ```
>>     storage               durability    PL(site)  PL(copies)     PL(NRE)
>> PL(rep)    loss/PiB
>>     ----------            ----------  ----------  ----------  ----------
>> ----------  ----------
>>     Site (1 PB)              99.900%   0.099950%   0.000e+00   0.000e+00
>> 0.000e+00   9.995e+11
>>     RADOS: 1-site, 1-cp      99.179%   0.099950%   0.721457%   0.000865%
>> 0.000e+00   1.010e+12
>>     RADOS: 1-site, 2-cp      99.900%   0.099950%   0.000049%   0.003442%
>> 0.000e+00   9.995e+11
>>     RADOS: 1-site, 3-cp      99.900%   0.099950%   5.090e-11   3.541e-09
>> 0.000e+00   9.995e+11
>>
>> ```
>>
>> The two result tables have different trend. In the first table, durability
>> value is 1 cp < 2 cp < 3 cp. However, the second table results in 1 cp < 2
>> cp = 3 cp.
>>
>> The two tables have the same PL(site),  PL(copies) , PL(NRE), and PL(rep).
>> The only difference is PL(site). PL(site) is constant, since number of site
>> is constant. The trend should be the same.
>>
>> How to explain the result?
>>
>> Anything I missed out? Thanks
>>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com