Question about Erasure-coding clusters and resiliency

Tim Gipson <tgipson@xxxxxxx> · Thu, 8 Feb 2018 20:43:21 +0000

Hey all,

We are trying to get an erasure coding cluster up and running but we are having a problem getting the cluster to remain up if we lose an OSD host.  

Currently we have 6 OSD hosts with 6 OSDs a piece.  I'm trying to build an EC profile and a crush rule that will allow the cluster to continue running if we lose a host, but I seem to misunderstand how the configuration of an EC pool/cluster is supposed to be implemented.  I would like to be able to set this up to allow for 2 host failures before data loss occurs.

Here is my crush rule:

{
    "rule_id": 2,
    "rule_name": "EC_ENA",
    "ruleset": 2,
    "type": 3,
    "min_size": 6,
    "max_size": 8,
    "steps": [
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "choose_indep",
            "num": 4,
            "type": "host"
        },
        {
            "op": "choose_indep",
            "num": 2,
            "type": "osd"
        },
        {
            "op": "emit"
        }
    ]
}

Here is my EC profile:

crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=6
m=2
plugin=jerasure
technique=reed_sol_van
w=8

Any direction or help would be greatly appreciated.

Thanks,

Tim Gipson
Systems Engineer

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com