EC Profiles & DR

duluxoz <duluxoz@xxxxxxxxx> · Tue, 5 Dec 2023 18:25:26 +1100

Hi All,

Looking for some help/explanation around erasure code pools, etc.

I set up a 3-node Ceph (Quincy) cluster with each box holding 7 OSDs 
(HDDs) and each box running Monitor, Manager, and iSCSI Gateway. For the 
record the cluster runs beautifully, without resource issues, etc.

I created an Erasure Code Profile, etc:

~~~
ceph osd erasure-code-profile set my_ec_profile plugin=jerasure k=4 m=2 
crush-failure-domain=osd
ceph osd crush rule create-erasure my_ec_rule my_ec_profile
ceph osd crush rule create-replicated my_replicated_rule default host
~~~

My Crush Map is:

~~~
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host ceph_1 {
  id -3            # do not change unnecessarily
  id -4 class hdd  # do not change unnecessarily
  # weight 38.09564
  alg straw2
  hash 0  # rjenkins1
  item osd.0 weight 5.34769
  item osd.1 weight 5.45799
  item osd.2 weight 5.45799
  item osd.3 weight 5.45799
  item osd.4 weight 5.45799
  item osd.5 weight 5.45799
  item osd.6 weight 5.45799
}
host ceph_2 {
  id -5            # do not change unnecessarily
  id -6 class hdd  # do not change unnecessarily
  # weight 38.09564
  alg straw2
  hash 0  # rjenkins1
  item osd.7 weight 5.34769
  item osd.8 weight 5.45799
  item osd.9 weight 5.45799
  item osd.10 weight 5.45799
  item osd.11 weight 5.45799
  item osd.12 weight 5.45799
  item osd.13 weight 5.45799
}
host ceph_3 {
  id -7            # do not change unnecessarily
  id -8 class hdd  # do not change unnecessarily
  # weight 38.09564
  alg straw2
  hash 0  # rjenkins1
  item osd.14 weight 5.34769
  item osd.15 weight 5.45799
  item osd.16 weight 5.45799
  item osd.17 weight 5.45799
  item osd.18 weight 5.45799
  item osd.19 weight 5.45799
  item osd.20 weight 5.45799
}
root default {
  id -1            # do not change unnecessarily
  id -2 class hdd  # do not change unnecessarily
  # weight 114.28693
  alg straw2
  hash 0  # rjenkins1
  item ceph_1 weight 38.09564
  item ceph_2 weight 38.09564
  item ceph_3 weight 38.09564
}

# rules
rule replicated_rule {
  id 0
  type replicated
  step take default
  step chooseleaf firstn 0 type host
  step emit
}
rule my_replicated_rule {
  id 1
  type replicated
  step take default
  step chooseleaf firstn 0 type host
  step emit
}
rule my_ec_rule {
  id 2
  type erasure
  step set_chooseleaf_tries 5
  step set_choose_tries 100
  step take default
  step choose indep 3 type host
  step chooseleaf indep 2 type osd
  step emit
}

# end crush map
~~~

Finally I create a pool:

~~~
ceph osd pool create my_pool 32 32 erasure my_ec_profile my_ec_rule
ceph osd pool application enable my_meta_pool rbd
rbd pool init my_meta_pool
rbd pool init my_pool
rbd create --size 16T my_pool/my_disk_1 --data-pool my_pool 
--image-feature journaling
~~~

So all this is to have some VMs (oVirt VMs, for the record) with 
automatic fall-over in the case of a Ceph Node loss - ie I was trying to 
"replicate" a 3-Disk RAID 5 array across the Ceph Nodes, so that I could 
loose a Node and still have a working set of VMs.

However, I took one of the Ceph Nodes down (gracefully) for some 
maintenance the other day and I lost *all* the VMs (ie oVirt complained 
that there was no active pool). As soon as I brought the down node back 
up everything was good again.

So my question is: What did I do wrong with my config?

Sound I, for example, change the EC Profile to `k=2, m=1`, but how is 
that practically different from `k=4, m=2` - yes, the later spreads the 
pool over more disks, but it should still only put 2 disks on each node, 
shouldn't it?

Thanks in advance

Cheers

Dulux-Oz
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx