Hi All,
Looking for some help/explanation around erasure code pools, etc.
I set up a 3-node Ceph (Quincy) cluster with each box holding 7 OSDs
(HDDs) and each box running Monitor, Manager, and iSCSI Gateway. For the
record the cluster runs beautifully, without resource issues, etc.
I created an Erasure Code Profile, etc:
~~~
ceph osd erasure-code-profile set my_ec_profile plugin=jerasure k=4 m=2
crush-failure-domain=osd
ceph osd crush rule create-erasure my_ec_rule my_ec_profile
ceph osd crush rule create-replicated my_replicated_rule default host
~~~
My Crush Map is:
~~~
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host ceph_1 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 38.09564
alg straw2
hash 0 # rjenkins1
item osd.0 weight 5.34769
item osd.1 weight 5.45799
item osd.2 weight 5.45799
item osd.3 weight 5.45799
item osd.4 weight 5.45799
item osd.5 weight 5.45799
item osd.6 weight 5.45799
}
host ceph_2 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 38.09564
alg straw2
hash 0 # rjenkins1
item osd.7 weight 5.34769
item osd.8 weight 5.45799
item osd.9 weight 5.45799
item osd.10 weight 5.45799
item osd.11 weight 5.45799
item osd.12 weight 5.45799
item osd.13 weight 5.45799
}
host ceph_3 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 38.09564
alg straw2
hash 0 # rjenkins1
item osd.14 weight 5.34769
item osd.15 weight 5.45799
item osd.16 weight 5.45799
item osd.17 weight 5.45799
item osd.18 weight 5.45799
item osd.19 weight 5.45799
item osd.20 weight 5.45799
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 114.28693
alg straw2
hash 0 # rjenkins1
item ceph_1 weight 38.09564
item ceph_2 weight 38.09564
item ceph_3 weight 38.09564
}
# rules
rule replicated_rule {
id 0
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule my_replicated_rule {
id 1
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule my_ec_rule {
id 2
type erasure
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step choose indep 3 type host
step chooseleaf indep 2 type osd
step emit
}
# end crush map
~~~
Finally I create a pool:
~~~
ceph osd pool create my_pool 32 32 erasure my_ec_profile my_ec_rule
ceph osd pool application enable my_meta_pool rbd
rbd pool init my_meta_pool
rbd pool init my_pool
rbd create --size 16T my_pool/my_disk_1 --data-pool my_pool
--image-feature journaling
~~~
So all this is to have some VMs (oVirt VMs, for the record) with
automatic fall-over in the case of a Ceph Node loss - ie I was trying to
"replicate" a 3-Disk RAID 5 array across the Ceph Nodes, so that I could
loose a Node and still have a working set of VMs.
However, I took one of the Ceph Nodes down (gracefully) for some
maintenance the other day and I lost *all* the VMs (ie oVirt complained
that there was no active pool). As soon as I brought the down node back
up everything was good again.
So my question is: What did I do wrong with my config?
Sound I, for example, change the EC Profile to `k=2, m=1`, but how is
that practically different from `k=4, m=2` - yes, the later spreads the
pool over more disks, but it should still only put 2 disks on each node,
shouldn't it?
Thanks in advance
Cheers
Dulux-Oz
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx