Re: EC Profiles & DR

duluxoz <duluxoz@xxxxxxxxx> · Tue, 5 Dec 2023 20:01:32 +1100

Thanks David, I knew I had something wrong  :-)

Just for my own edification: Why is k=2, m=1 not recommended for 
production? Considered to "fragile", or something else?

Cheers

Dulux-Oz

On 05/12/2023 19:53, David Rivera wrote:
First problem here is you are using crush-failure-domain=osd when you 
should use crush-failure-domain=host. With three hosts, you should use 
k=2, m=1; this is not recommended in  production environment.

On Mon, Dec 4, 2023, 23:26 duluxoz <duluxoz@xxxxxxxxx> wrote:

    Hi All,

    Looking for some help/explanation around erasure code pools, etc.

    I set up a 3-node Ceph (Quincy) cluster with each box holding 7 OSDs
    (HDDs) and each box running Monitor, Manager, and iSCSI Gateway.
    For the
    record the cluster runs beautifully, without resource issues, etc.

    I created an Erasure Code Profile, etc:

    ~~~
    ceph osd erasure-code-profile set my_ec_profile plugin=jerasure
    k=4 m=2
    crush-failure-domain=osd
    ceph osd crush rule create-erasure my_ec_rule my_ec_profile
    ceph osd crush rule create-replicated my_replicated_rule default host
    ~~~

    My Crush Map is:

    ~~~
    # begin crush map
    tunable choose_local_tries 0
    tunable choose_local_fallback_tries 0
    tunable choose_total_tries 50
    tunable chooseleaf_descend_once 1
    tunable chooseleaf_vary_r 1
    tunable chooseleaf_stable 1
    tunable straw_calc_version 1
    tunable allowed_bucket_algs 54

    # devices
    device 0 osd.0 class hdd
    device 1 osd.1 class hdd
    device 2 osd.2 class hdd
    device 3 osd.3 class hdd
    device 4 osd.4 class hdd
    device 5 osd.5 class hdd
    device 6 osd.6 class hdd
    device 7 osd.7 class hdd
    device 8 osd.8 class hdd
    device 9 osd.9 class hdd
    device 10 osd.10 class hdd
    device 11 osd.11 class hdd
    device 12 osd.12 class hdd
    device 13 osd.13 class hdd
    device 14 osd.14 class hdd
    device 15 osd.15 class hdd
    device 16 osd.16 class hdd
    device 17 osd.17 class hdd
    device 18 osd.18 class hdd
    device 19 osd.19 class hdd
    device 20 osd.20 class hdd

    # types
    type 0 osd
    type 1 host
    type 2 chassis
    type 3 rack
    type 4 row
    type 5 pdu
    type 6 pod
    type 7 room
    type 8 datacenter
    type 9 zone
    type 10 region
    type 11 root

    # buckets
    host ceph_1 {
       id -3            # do not change unnecessarily
       id -4 class hdd  # do not change unnecessarily
       # weight 38.09564
       alg straw2
       hash 0  # rjenkins1
       item osd.0 weight 5.34769
       item osd.1 weight 5.45799
       item osd.2 weight 5.45799
       item osd.3 weight 5.45799
       item osd.4 weight 5.45799
       item osd.5 weight 5.45799
       item osd.6 weight 5.45799
    }
    host ceph_2 {
       id -5            # do not change unnecessarily
       id -6 class hdd  # do not change unnecessarily
       # weight 38.09564
       alg straw2
       hash 0  # rjenkins1
       item osd.7 weight 5.34769
       item osd.8 weight 5.45799
       item osd.9 weight 5.45799
       item osd.10 weight 5.45799
       item osd.11 weight 5.45799
       item osd.12 weight 5.45799
       item osd.13 weight 5.45799
    }
    host ceph_3 {
       id -7            # do not change unnecessarily
       id -8 class hdd  # do not change unnecessarily
       # weight 38.09564
       alg straw2
       hash 0  # rjenkins1
       item osd.14 weight 5.34769
       item osd.15 weight 5.45799
       item osd.16 weight 5.45799
       item osd.17 weight 5.45799
       item osd.18 weight 5.45799
       item osd.19 weight 5.45799
       item osd.20 weight 5.45799
    }
    root default {
       id -1            # do not change unnecessarily
       id -2 class hdd  # do not change unnecessarily
       # weight 114.28693
       alg straw2
       hash 0  # rjenkins1
       item ceph_1 weight 38.09564
       item ceph_2 weight 38.09564
       item ceph_3 weight 38.09564
    }

    # rules
    rule replicated_rule {
       id 0
       type replicated
       step take default
       step chooseleaf firstn 0 type host
       step emit
    }
    rule my_replicated_rule {
       id 1
       type replicated
       step take default
       step chooseleaf firstn 0 type host
       step emit
    }
    rule my_ec_rule {
       id 2
       type erasure
       step set_chooseleaf_tries 5
       step set_choose_tries 100
       step take default
       step choose indep 3 type host
       step chooseleaf indep 2 type osd
       step emit
    }

    # end crush map
    ~~~

    Finally I create a pool:

    ~~~
    ceph osd pool create my_pool 32 32 erasure my_ec_profile my_ec_rule
    ceph osd pool application enable my_meta_pool rbd
    rbd pool init my_meta_pool
    rbd pool init my_pool
    rbd create --size 16T my_pool/my_disk_1 --data-pool my_pool
    --image-feature journaling
    ~~~

    So all this is to have some VMs (oVirt VMs, for the record) with
    automatic fall-over in the case of a Ceph Node loss - ie I was
    trying to
    "replicate" a 3-Disk RAID 5 array across the Ceph Nodes, so that I
    could
    loose a Node and still have a working set of VMs.

    However, I took one of the Ceph Nodes down (gracefully) for some
    maintenance the other day and I lost *all* the VMs (ie oVirt
    complained
    that there was no active pool). As soon as I brought the down node
    back
    up everything was good again.

    So my question is: What did I do wrong with my config?

    Sound I, for example, change the EC Profile to `k=2, m=1`, but how is
    that practically different from `k=4, m=2` - yes, the later
    spreads the
    pool over more disks, but it should still only put 2 disks on each
    node,
    shouldn't it?

    Thanks in advance

    Cheers

    Dulux-Oz
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx