Best practice K/M-parameters EC pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Craig,

I assume the reason for the 48 hours recovery time is to keep the cost of the cluster low ? I wrote "1h recovery time" because it is roughly the time it would take to move 4TB over a 10Gb/s link. Could you upgrade your hardware to reduce the recovery time to less than two hours ? Or are there factors other than cost that prevent this ?

Cheers

On 26/08/2014 19:37, Craig Lewis wrote:
> My OSD rebuild time is more like 48 hours (4TB disks, >60% full, osd max backfills = 1).   I believe that increases my risk of failure by 48^2 .  Since your numbers are failure rate per hour per disk, I need to consider the risk for the whole time for each disk.  So more formally, rebuild time to the power of (replicas -1).
> 
> So I'm at 2304/100,000,000, or  approximately 1/43,000.  That's a much higher risk than 1 / 10^8.
> 
> 
> A risk of 1/43,000 means that I'm more likely to lose data due to human error than disk failure.  Still, I can put a small bit of effort in to optimize recovery speed, and lower this number.  Managing human error is much harder.
> 
> 
> 
> 
> 
> 
> On Tue, Aug 26, 2014 at 7:12 AM, Loic Dachary <loic at dachary.org <mailto:loic at dachary.org>> wrote:
> 
>     Using percentages instead of numbers lead me to calculations errors. Here it is again using 1/100 instead of % for clarity ;-)
> 
>     Assuming that:
> 
>     * The pool is configured for three replicas (size = 3 which is the default)
>     * It takes one hour for Ceph to recover from the loss of a single OSD
>     * Any other disk has a 1/100,000 chance to fail within the hour following the failure of the first disk (assuming AFR https://en.wikipedia.org/wiki/Annualized_failure_rate of every disk is 8%, divided by the number of hours during a year == (0.08 / 8760) ~= 1/100,000
>     * A given disk does not participate in more than 100 PG
> 

-- 
Lo?c Dachary, Artisan Logiciel Libre

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 263 bytes
Desc: OpenPGP digital signature
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140826/7f63e686/attachment.pgp>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux