On 15/08/2014 15:42, Erik Logtenberg wrote: >>> >>> I haven't done the actual calculations, but given some % chance of disk >>> failure, I would assume that losing x out of y disks has roughly the >>> same chance as losing 2*x out of 2*y disks over the same period. >>> >>> That's also why you generally want to limit RAID5 arrays to maybe 6 >>> disks or so and move to RAID6 for bigger arrays. For arrays bigger than >>> 20 disks you would usually split those into separate arrays, just to >>> keep the (parity disks / total disks) fraction high enough. >>> >>> With regard to data safety I would guess that 3+2 and 6+4 are roughly >>> equal, although the behaviour of 6+4 is probably easier to predict >>> because bigger numbers makes your calculations less dependent on >>> individual deviations in reliability. >>> >>> Do you guys feel this argument is valid? >> >> Here is how I reason about it, roughly: >> >> If the probability of loosing a disk is 0.1%, the probability of loosing two disks simultaneously (i.e. before the failure can be recovered) would be 0.1*0.1 = 0.01% and three disks becomes 0.1*0.1*0.1 = 0.001% and four disks becomes 0.0001% >> >> Accurately calculating the reliability of the system as a whole is a lot more complex (see https://wiki.ceph.com/Development/Add_erasure_coding_to_the_durability_model/ for more information). >> >> Cheers > > Okay, I see that in your calculation, you leave the total amount of > disks completely out of the equation. Yes. If you have a small number of disks I'm not sure how to calculate the durability. For instance if I have 50 disk cluster within a rack, the durability is dominated by the probability that the rack is set on fire and increasing m from 3 to 5 is most certainly pointless ;-) > The link you provided is very > useful indeed and does some actual calculations. Interestingly, the > example in the details page [1] use k=32 and m=32 for a total of 64 blocks. > Those are very much bigger values than Mark Nelson mentioned earlier. Is > that example merely meant to demonstrate the theoretical advantages, or > would you actually recommend using those numbers in practice. > Let's assume that we have at least 64 OSD's available, would you > recommend k=32 and m=32? It is theoretical, I'm not aware of any Ceph use case requiring that kind of setting. There may be a use case though, it's not absurd, just not common. I would be happy to hear about it. Cheers > > [1] > https://wiki.ceph.com/Development/Add_erasure_coding_to_the_durability_model/Technical_details_on_the_model > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Lo?c Dachary, Artisan Logiciel Libre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 263 bytes Desc: OpenPGP digital signature URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140815/61de5b6d/attachment.pgp>