First problem here is you are using crush-failure-domain=osd when you should use crush-failure-domain=host. With three hosts, you should use k=2, m=1; this is not recommended in production environment. On Mon, Dec 4, 2023, 23:26 duluxoz <duluxoz@xxxxxxxxx> wrote: > Hi All, > > Looking for some help/explanation around erasure code pools, etc. > > I set up a 3-node Ceph (Quincy) cluster with each box holding 7 OSDs > (HDDs) and each box running Monitor, Manager, and iSCSI Gateway. For the > record the cluster runs beautifully, without resource issues, etc. > > I created an Erasure Code Profile, etc: > > ~~~ > ceph osd erasure-code-profile set my_ec_profile plugin=jerasure k=4 m=2 > crush-failure-domain=osd > ceph osd crush rule create-erasure my_ec_rule my_ec_profile > ceph osd crush rule create-replicated my_replicated_rule default host > ~~~ > > My Crush Map is: > > ~~~ > # begin crush map > tunable choose_local_tries 0 > tunable choose_local_fallback_tries 0 > tunable choose_total_tries 50 > tunable chooseleaf_descend_once 1 > tunable chooseleaf_vary_r 1 > tunable chooseleaf_stable 1 > tunable straw_calc_version 1 > tunable allowed_bucket_algs 54 > > # devices > device 0 osd.0 class hdd > device 1 osd.1 class hdd > device 2 osd.2 class hdd > device 3 osd.3 class hdd > device 4 osd.4 class hdd > device 5 osd.5 class hdd > device 6 osd.6 class hdd > device 7 osd.7 class hdd > device 8 osd.8 class hdd > device 9 osd.9 class hdd > device 10 osd.10 class hdd > device 11 osd.11 class hdd > device 12 osd.12 class hdd > device 13 osd.13 class hdd > device 14 osd.14 class hdd > device 15 osd.15 class hdd > device 16 osd.16 class hdd > device 17 osd.17 class hdd > device 18 osd.18 class hdd > device 19 osd.19 class hdd > device 20 osd.20 class hdd > > # types > type 0 osd > type 1 host > type 2 chassis > type 3 rack > type 4 row > type 5 pdu > type 6 pod > type 7 room > type 8 datacenter > type 9 zone > type 10 region > type 11 root > > # buckets > host ceph_1 { > id -3 # do not change unnecessarily > id -4 class hdd # do not change unnecessarily > # weight 38.09564 > alg straw2 > hash 0 # rjenkins1 > item osd.0 weight 5.34769 > item osd.1 weight 5.45799 > item osd.2 weight 5.45799 > item osd.3 weight 5.45799 > item osd.4 weight 5.45799 > item osd.5 weight 5.45799 > item osd.6 weight 5.45799 > } > host ceph_2 { > id -5 # do not change unnecessarily > id -6 class hdd # do not change unnecessarily > # weight 38.09564 > alg straw2 > hash 0 # rjenkins1 > item osd.7 weight 5.34769 > item osd.8 weight 5.45799 > item osd.9 weight 5.45799 > item osd.10 weight 5.45799 > item osd.11 weight 5.45799 > item osd.12 weight 5.45799 > item osd.13 weight 5.45799 > } > host ceph_3 { > id -7 # do not change unnecessarily > id -8 class hdd # do not change unnecessarily > # weight 38.09564 > alg straw2 > hash 0 # rjenkins1 > item osd.14 weight 5.34769 > item osd.15 weight 5.45799 > item osd.16 weight 5.45799 > item osd.17 weight 5.45799 > item osd.18 weight 5.45799 > item osd.19 weight 5.45799 > item osd.20 weight 5.45799 > } > root default { > id -1 # do not change unnecessarily > id -2 class hdd # do not change unnecessarily > # weight 114.28693 > alg straw2 > hash 0 # rjenkins1 > item ceph_1 weight 38.09564 > item ceph_2 weight 38.09564 > item ceph_3 weight 38.09564 > } > > # rules > rule replicated_rule { > id 0 > type replicated > step take default > step chooseleaf firstn 0 type host > step emit > } > rule my_replicated_rule { > id 1 > type replicated > step take default > step chooseleaf firstn 0 type host > step emit > } > rule my_ec_rule { > id 2 > type erasure > step set_chooseleaf_tries 5 > step set_choose_tries 100 > step take default > step choose indep 3 type host > step chooseleaf indep 2 type osd > step emit > } > > # end crush map > ~~~ > > Finally I create a pool: > > ~~~ > ceph osd pool create my_pool 32 32 erasure my_ec_profile my_ec_rule > ceph osd pool application enable my_meta_pool rbd > rbd pool init my_meta_pool > rbd pool init my_pool > rbd create --size 16T my_pool/my_disk_1 --data-pool my_pool > --image-feature journaling > ~~~ > > So all this is to have some VMs (oVirt VMs, for the record) with > automatic fall-over in the case of a Ceph Node loss - ie I was trying to > "replicate" a 3-Disk RAID 5 array across the Ceph Nodes, so that I could > loose a Node and still have a working set of VMs. > > However, I took one of the Ceph Nodes down (gracefully) for some > maintenance the other day and I lost *all* the VMs (ie oVirt complained > that there was no active pool). As soon as I brought the down node back > up everything was good again. > > So my question is: What did I do wrong with my config? > > Sound I, for example, change the EC Profile to `k=2, m=1`, but how is > that practically different from `k=4, m=2` - yes, the later spreads the > pool over more disks, but it should still only put 2 disks on each node, > shouldn't it? > > Thanks in advance > > Cheers > > Dulux-Oz > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx