Re: EC Profiles & DR

Danny Webb <Danny.Webb@xxxxxxxxxxxxxxx> · Tue, 5 Dec 2023 09:06:07 +0000

Usually EC requires at least k+1 to be up and active for the pool to be working.  Setting the min value to k risks dataloss.
________________________________
From: duluxoz <duluxoz@xxxxxxxxx>
Sent: 05 December 2023 09:01
To: rivera.david87@xxxxxxxxx <rivera.david87@xxxxxxxxx>; matthew@xxxxxxxxxxxxxxx <matthew@xxxxxxxxxxxxxxx>
Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject:  Re: EC Profiles & DR

CAUTION: This email originates from outside THG

Thanks David, I knew I had something wrong  :-)

Just for my own edification: Why is k=2, m=1 not recommended for
production? Considered to "fragile", or something else?

Cheers

Dulux-Oz

On 05/12/2023 19:53, David Rivera wrote:
> First problem here is you are using crush-failure-domain=osd when you
> should use crush-failure-domain=host. With three hosts, you should use
> k=2, m=1; this is not recommended in  production environment.
>
> On Mon, Dec 4, 2023, 23:26 duluxoz <duluxoz@xxxxxxxxx> wrote:
>
>     Hi All,
>
>     Looking for some help/explanation around erasure code pools, etc.
>
>     I set up a 3-node Ceph (Quincy) cluster with each box holding 7 OSDs
>     (HDDs) and each box running Monitor, Manager, and iSCSI Gateway.
>     For the
>     record the cluster runs beautifully, without resource issues, etc.
>
>     I created an Erasure Code Profile, etc:
>
>     ~~~
>     ceph osd erasure-code-profile set my_ec_profile plugin=jerasure
>     k=4 m=2
>     crush-failure-domain=osd
>     ceph osd crush rule create-erasure my_ec_rule my_ec_profile
>     ceph osd crush rule create-replicated my_replicated_rule default host
>     ~~~
>
>     My Crush Map is:
>
>     ~~~
>     # begin crush map
>     tunable choose_local_tries 0
>     tunable choose_local_fallback_tries 0
>     tunable choose_total_tries 50
>     tunable chooseleaf_descend_once 1
>     tunable chooseleaf_vary_r 1
>     tunable chooseleaf_stable 1
>     tunable straw_calc_version 1
>     tunable allowed_bucket_algs 54
>
>     # devices
>     device 0 osd.0 class hdd
>     device 1 osd.1 class hdd
>     device 2 osd.2 class hdd
>     device 3 osd.3 class hdd
>     device 4 osd.4 class hdd
>     device 5 osd.5 class hdd
>     device 6 osd.6 class hdd
>     device 7 osd.7 class hdd
>     device 8 osd.8 class hdd
>     device 9 osd.9 class hdd
>     device 10 osd.10 class hdd
>     device 11 osd.11 class hdd
>     device 12 osd.12 class hdd
>     device 13 osd.13 class hdd
>     device 14 osd.14 class hdd
>     device 15 osd.15 class hdd
>     device 16 osd.16 class hdd
>     device 17 osd.17 class hdd
>     device 18 osd.18 class hdd
>     device 19 osd.19 class hdd
>     device 20 osd.20 class hdd
>
>     # types
>     type 0 osd
>     type 1 host
>     type 2 chassis
>     type 3 rack
>     type 4 row
>     type 5 pdu
>     type 6 pod
>     type 7 room
>     type 8 datacenter
>     type 9 zone
>     type 10 region
>     type 11 root
>
>     # buckets
>     host ceph_1 {
>        id -3            # do not change unnecessarily
>        id -4 class hdd  # do not change unnecessarily
>        # weight 38.09564
>        alg straw2
>        hash 0  # rjenkins1
>        item osd.0 weight 5.34769
>        item osd.1 weight 5.45799
>        item osd.2 weight 5.45799
>        item osd.3 weight 5.45799
>        item osd.4 weight 5.45799
>        item osd.5 weight 5.45799
>        item osd.6 weight 5.45799
>     }
>     host ceph_2 {
>        id -5            # do not change unnecessarily
>        id -6 class hdd  # do not change unnecessarily
>        # weight 38.09564
>        alg straw2
>        hash 0  # rjenkins1
>        item osd.7 weight 5.34769
>        item osd.8 weight 5.45799
>        item osd.9 weight 5.45799
>        item osd.10 weight 5.45799
>        item osd.11 weight 5.45799
>        item osd.12 weight 5.45799
>        item osd.13 weight 5.45799
>     }
>     host ceph_3 {
>        id -7            # do not change unnecessarily
>        id -8 class hdd  # do not change unnecessarily
>        # weight 38.09564
>        alg straw2
>        hash 0  # rjenkins1
>        item osd.14 weight 5.34769
>        item osd.15 weight 5.45799
>        item osd.16 weight 5.45799
>        item osd.17 weight 5.45799
>        item osd.18 weight 5.45799
>        item osd.19 weight 5.45799
>        item osd.20 weight 5.45799
>     }
>     root default {
>        id -1            # do not change unnecessarily
>        id -2 class hdd  # do not change unnecessarily
>        # weight 114.28693
>        alg straw2
>        hash 0  # rjenkins1
>        item ceph_1 weight 38.09564
>        item ceph_2 weight 38.09564
>        item ceph_3 weight 38.09564
>     }
>
>     # rules
>     rule replicated_rule {
>        id 0
>        type replicated
>        step take default
>        step chooseleaf firstn 0 type host
>        step emit
>     }
>     rule my_replicated_rule {
>        id 1
>        type replicated
>        step take default
>        step chooseleaf firstn 0 type host
>        step emit
>     }
>     rule my_ec_rule {
>        id 2
>        type erasure
>        step set_chooseleaf_tries 5
>        step set_choose_tries 100
>        step take default
>        step choose indep 3 type host
>        step chooseleaf indep 2 type osd
>        step emit
>     }
>
>     # end crush map
>     ~~~
>
>     Finally I create a pool:
>
>     ~~~
>     ceph osd pool create my_pool 32 32 erasure my_ec_profile my_ec_rule
>     ceph osd pool application enable my_meta_pool rbd
>     rbd pool init my_meta_pool
>     rbd pool init my_pool
>     rbd create --size 16T my_pool/my_disk_1 --data-pool my_pool
>     --image-feature journaling
>     ~~~
>
>     So all this is to have some VMs (oVirt VMs, for the record) with
>     automatic fall-over in the case of a Ceph Node loss - ie I was
>     trying to
>     "replicate" a 3-Disk RAID 5 array across the Ceph Nodes, so that I
>     could
>     loose a Node and still have a working set of VMs.
>
>     However, I took one of the Ceph Nodes down (gracefully) for some
>     maintenance the other day and I lost *all* the VMs (ie oVirt
>     complained
>     that there was no active pool). As soon as I brought the down node
>     back
>     up everything was good again.
>
>     So my question is: What did I do wrong with my config?
>
>     Sound I, for example, change the EC Profile to `k=2, m=1`, but how is
>     that practically different from `k=4, m=2` - yes, the later
>     spreads the
>     pool over more disks, but it should still only put 2 disks on each
>     node,
>     shouldn't it?
>
>     Thanks in advance
>
>     Cheers
>
>     Dulux-Oz
>     _______________________________________________
>     ceph-users mailing list -- ceph-users@xxxxxxx
>     To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx