Hi Tim,
step choose indep 6 type host
step choose indep 2 type osd
this will distribute the 12 (k+m) shards over your 6 hosts (2 shards per host)
With the current setup you can only handle 1 host failure without loosing any data, BUT everything will probably freeze until you bring the failed node (or the OSD"s in it) back up.
Your setup indicates k=6, m=2 and all 8 shards are distributed to 4 hosts (2 shards/osds per host). Be aware that a pool which uses this erasure code profile will have a min_size of 7! (min_size = k+1)
So this means in case of a node failure there are only 6 shards available so no writes are then accepted to the pool -> freeze of i/o.
If you change the profile to k=5 and m=3 you can have a node failure without freezing i/o. (min_size = 6)
If you want to sustain 2 node failures you must increase the m even further:
for instance k=7, m=5
step choose indep 6 type host
step choose indep 2 type osd
this will distribute the 12 (k+m) shards over your 6 hosts (2 shards per host)
min_size = 8 so you can have 2 node failures without freezing i/o.
Hey all,
We are trying to get an erasure coding cluster up and running but we are having a problem getting the cluster to remain up if we lose an OSD host.
Currently we have 6 OSD hosts with 6 OSDs a piece. I'm trying to build an EC profile and a crush rule that will allow the cluster to continue running if we lose a host, but I seem to misunderstand how the configuration of an EC pool/cluster is supposed to be implemented. I would like to be able to set this up to allow for 2 host failures before data loss occurs.
Here is my crush rule:
{
"rule_id": 2,
"rule_name": "EC_ENA",
"ruleset": 2,
"type": 3,
"min_size": 6,
"max_size": 8,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_indep",
"num": 4,
"type": "host"
},
{
"op": "choose_indep",
"num": 2,
"type": "osd"
},
{
"op": "emit"
}
]
}
Here is my EC profile:
crush-device-class=
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=6
m=2
plugin=jerasure
technique=reed_sol_van
w=8
Any direction or help would be greatly appreciated.
Thanks,
Tim Gipson
Systems Engineer
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com