Re: EC Profiles & DR

David Rivera <rivera.david87@xxxxxxxxx> · Tue, 5 Dec 2023 00:53:59 -0800

First problem here is you are using crush-failure-domain=osd when you
should use crush-failure-domain=host. With three hosts, you should use k=2,
m=1; this is not recommended in  production environment.

On Mon, Dec 4, 2023, 23:26 duluxoz <duluxoz@xxxxxxxxx> wrote:

> Hi All,
>
> Looking for some help/explanation around erasure code pools, etc.
>
> I set up a 3-node Ceph (Quincy) cluster with each box holding 7 OSDs
> (HDDs) and each box running Monitor, Manager, and iSCSI Gateway. For the
> record the cluster runs beautifully, without resource issues, etc.
>
> I created an Erasure Code Profile, etc:
>
> ~~~
> ceph osd erasure-code-profile set my_ec_profile plugin=jerasure k=4 m=2
> crush-failure-domain=osd
> ceph osd crush rule create-erasure my_ec_rule my_ec_profile
> ceph osd crush rule create-replicated my_replicated_rule default host
> ~~~
>
> My Crush Map is:
>
> ~~~
> # begin crush map
> tunable choose_local_tries 0
> tunable choose_local_fallback_tries 0
> tunable choose_total_tries 50
> tunable chooseleaf_descend_once 1
> tunable chooseleaf_vary_r 1
> tunable chooseleaf_stable 1
> tunable straw_calc_version 1
> tunable allowed_bucket_algs 54
>
> # devices
> device 0 osd.0 class hdd
> device 1 osd.1 class hdd
> device 2 osd.2 class hdd
> device 3 osd.3 class hdd
> device 4 osd.4 class hdd
> device 5 osd.5 class hdd
> device 6 osd.6 class hdd
> device 7 osd.7 class hdd
> device 8 osd.8 class hdd
> device 9 osd.9 class hdd
> device 10 osd.10 class hdd
> device 11 osd.11 class hdd
> device 12 osd.12 class hdd
> device 13 osd.13 class hdd
> device 14 osd.14 class hdd
> device 15 osd.15 class hdd
> device 16 osd.16 class hdd
> device 17 osd.17 class hdd
> device 18 osd.18 class hdd
> device 19 osd.19 class hdd
> device 20 osd.20 class hdd
>
> # types
> type 0 osd
> type 1 host
> type 2 chassis
> type 3 rack
> type 4 row
> type 5 pdu
> type 6 pod
> type 7 room
> type 8 datacenter
> type 9 zone
> type 10 region
> type 11 root
>
> # buckets
> host ceph_1 {
>    id -3            # do not change unnecessarily
>    id -4 class hdd  # do not change unnecessarily
>    # weight 38.09564
>    alg straw2
>    hash 0  # rjenkins1
>    item osd.0 weight 5.34769
>    item osd.1 weight 5.45799
>    item osd.2 weight 5.45799
>    item osd.3 weight 5.45799
>    item osd.4 weight 5.45799
>    item osd.5 weight 5.45799
>    item osd.6 weight 5.45799
> }
> host ceph_2 {
>    id -5            # do not change unnecessarily
>    id -6 class hdd  # do not change unnecessarily
>    # weight 38.09564
>    alg straw2
>    hash 0  # rjenkins1
>    item osd.7 weight 5.34769
>    item osd.8 weight 5.45799
>    item osd.9 weight 5.45799
>    item osd.10 weight 5.45799
>    item osd.11 weight 5.45799
>    item osd.12 weight 5.45799
>    item osd.13 weight 5.45799
> }
> host ceph_3 {
>    id -7            # do not change unnecessarily
>    id -8 class hdd  # do not change unnecessarily
>    # weight 38.09564
>    alg straw2
>    hash 0  # rjenkins1
>    item osd.14 weight 5.34769
>    item osd.15 weight 5.45799
>    item osd.16 weight 5.45799
>    item osd.17 weight 5.45799
>    item osd.18 weight 5.45799
>    item osd.19 weight 5.45799
>    item osd.20 weight 5.45799
> }
> root default {
>    id -1            # do not change unnecessarily
>    id -2 class hdd  # do not change unnecessarily
>    # weight 114.28693
>    alg straw2
>    hash 0  # rjenkins1
>    item ceph_1 weight 38.09564
>    item ceph_2 weight 38.09564
>    item ceph_3 weight 38.09564
> }
>
> # rules
> rule replicated_rule {
>    id 0
>    type replicated
>    step take default
>    step chooseleaf firstn 0 type host
>    step emit
> }
> rule my_replicated_rule {
>    id 1
>    type replicated
>    step take default
>    step chooseleaf firstn 0 type host
>    step emit
> }
> rule my_ec_rule {
>    id 2
>    type erasure
>    step set_chooseleaf_tries 5
>    step set_choose_tries 100
>    step take default
>    step choose indep 3 type host
>    step chooseleaf indep 2 type osd
>    step emit
> }
>
> # end crush map
> ~~~
>
> Finally I create a pool:
>
> ~~~
> ceph osd pool create my_pool 32 32 erasure my_ec_profile my_ec_rule
> ceph osd pool application enable my_meta_pool rbd
> rbd pool init my_meta_pool
> rbd pool init my_pool
> rbd create --size 16T my_pool/my_disk_1 --data-pool my_pool
> --image-feature journaling
> ~~~
>
> So all this is to have some VMs (oVirt VMs, for the record) with
> automatic fall-over in the case of a Ceph Node loss - ie I was trying to
> "replicate" a 3-Disk RAID 5 array across the Ceph Nodes, so that I could
> loose a Node and still have a working set of VMs.
>
> However, I took one of the Ceph Nodes down (gracefully) for some
> maintenance the other day and I lost *all* the VMs (ie oVirt complained
> that there was no active pool). As soon as I brought the down node back
> up everything was good again.
>
> So my question is: What did I do wrong with my config?
>
> Sound I, for example, change the EC Profile to `k=2, m=1`, but how is
> that practically different from `k=4, m=2` - yes, the later spreads the
> pool over more disks, but it should still only put 2 disks on each node,
> shouldn't it?
>
> Thanks in advance
>
> Cheers
>
> Dulux-Oz
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx