Hello, considering erasure coding for the first time (so excuse seemingly obvious questions) and staring at the various previous posts and documentation and in particular: http://docs.ceph.com/docs/master/dev/osd_internals/erasure_coding/ Am I correct that unlike with with replication there isn't a maximum size of the critical path OSDs? Meaning that with replication x3 and typical values of 100 PGs per OSD at most 300 OSDs form a set out of which 3 OSDs need to fail for data loss. The statistical likelihood for that based on some assumptions is significant, but not nightmarishly so. A cluster with 1500 OSDs in total is thus as susceptible as one with just 300. Meaning that 3 disk losses in the big cluster don't necessarily mean data loss at all. However it feels that with EC all OSDs can essentially be in the same set and thus having 6 out of 1500 OSDs fail in a 10+5 EC pool with 100 PGs per OSD would affect every last object in that cluster, not just a subset. If these ramblings are correct (or close to it), then an obvious risk mitigation would be to pick smallish encoding sets and low numbers of PGs. For example a 4+2 encoding and 50 PGs per OSD would reduce things down to the same risk as a 3x replica pool. Feedback welcome. Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com