Erasure Coding failure domain (again)

Christian Balzer <chibi@xxxxxxx> · Sun, 31 Mar 2019 17:56:10 +0900

Hello,

considering erasure coding for the first time (so excuse seemingly
obvious questions) and staring at the various previous posts
and documentation and in particular:
http://docs.ceph.com/docs/master/dev/osd_internals/erasure_coding/

Am I correct that unlike with with replication there isn't a maximum size
of the critical path OSDs?

Meaning that with replication x3 and typical values of 100 PGs per OSD at
most 300 OSDs form a set out of which 3 OSDs need to fail for data loss.
The statistical likelihood for that based on some assumptions
is significant, but not nightmarishly so.
A cluster with 1500 OSDs in total is thus as susceptible as one with just
300.
Meaning that 3 disk losses in the big cluster don't necessarily mean data
loss at all.

However it feels that with EC all OSDs can essentially be in the same set
and thus having 6 out of 1500 OSDs fail in a 10+5 EC pool with 100 PGs per
OSD would affect every last object in that cluster, not just a subset.

If these ramblings are correct (or close to it), then an obvious risk
mitigation would be to pick smallish encoding sets and low numbers of PGs.
For example a 4+2 encoding and 50 PGs per OSD would reduce things down to
the same risk as a 3x replica pool.

Feedback welcome.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com