Re: EC Profiles & DR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



And the second issue is with k4 m2 you'll have min_size = 5 which means if one host is down your PGs become inactive, which is what you most likely experienced.

Zitat von David Rivera <rivera.david87@xxxxxxxxx>:

First problem here is you are using crush-failure-domain=osd when you
should use crush-failure-domain=host. With three hosts, you should use k=2,
m=1; this is not recommended in  production environment.

On Mon, Dec 4, 2023, 23:26 duluxoz <duluxoz@xxxxxxxxx> wrote:

Hi All,

Looking for some help/explanation around erasure code pools, etc.

I set up a 3-node Ceph (Quincy) cluster with each box holding 7 OSDs
(HDDs) and each box running Monitor, Manager, and iSCSI Gateway. For the
record the cluster runs beautifully, without resource issues, etc.

I created an Erasure Code Profile, etc:

~~~
ceph osd erasure-code-profile set my_ec_profile plugin=jerasure k=4 m=2
crush-failure-domain=osd
ceph osd crush rule create-erasure my_ec_rule my_ec_profile
ceph osd crush rule create-replicated my_replicated_rule default host
~~~

My Crush Map is:

~~~
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host ceph_1 {
   id -3            # do not change unnecessarily
   id -4 class hdd  # do not change unnecessarily
   # weight 38.09564
   alg straw2
   hash 0  # rjenkins1
   item osd.0 weight 5.34769
   item osd.1 weight 5.45799
   item osd.2 weight 5.45799
   item osd.3 weight 5.45799
   item osd.4 weight 5.45799
   item osd.5 weight 5.45799
   item osd.6 weight 5.45799
}
host ceph_2 {
   id -5            # do not change unnecessarily
   id -6 class hdd  # do not change unnecessarily
   # weight 38.09564
   alg straw2
   hash 0  # rjenkins1
   item osd.7 weight 5.34769
   item osd.8 weight 5.45799
   item osd.9 weight 5.45799
   item osd.10 weight 5.45799
   item osd.11 weight 5.45799
   item osd.12 weight 5.45799
   item osd.13 weight 5.45799
}
host ceph_3 {
   id -7            # do not change unnecessarily
   id -8 class hdd  # do not change unnecessarily
   # weight 38.09564
   alg straw2
   hash 0  # rjenkins1
   item osd.14 weight 5.34769
   item osd.15 weight 5.45799
   item osd.16 weight 5.45799
   item osd.17 weight 5.45799
   item osd.18 weight 5.45799
   item osd.19 weight 5.45799
   item osd.20 weight 5.45799
}
root default {
   id -1            # do not change unnecessarily
   id -2 class hdd  # do not change unnecessarily
   # weight 114.28693
   alg straw2
   hash 0  # rjenkins1
   item ceph_1 weight 38.09564
   item ceph_2 weight 38.09564
   item ceph_3 weight 38.09564
}

# rules
rule replicated_rule {
   id 0
   type replicated
   step take default
   step chooseleaf firstn 0 type host
   step emit
}
rule my_replicated_rule {
   id 1
   type replicated
   step take default
   step chooseleaf firstn 0 type host
   step emit
}
rule my_ec_rule {
   id 2
   type erasure
   step set_chooseleaf_tries 5
   step set_choose_tries 100
   step take default
   step choose indep 3 type host
   step chooseleaf indep 2 type osd
   step emit
}

# end crush map
~~~

Finally I create a pool:

~~~
ceph osd pool create my_pool 32 32 erasure my_ec_profile my_ec_rule
ceph osd pool application enable my_meta_pool rbd
rbd pool init my_meta_pool
rbd pool init my_pool
rbd create --size 16T my_pool/my_disk_1 --data-pool my_pool
--image-feature journaling
~~~

So all this is to have some VMs (oVirt VMs, for the record) with
automatic fall-over in the case of a Ceph Node loss - ie I was trying to
"replicate" a 3-Disk RAID 5 array across the Ceph Nodes, so that I could
loose a Node and still have a working set of VMs.

However, I took one of the Ceph Nodes down (gracefully) for some
maintenance the other day and I lost *all* the VMs (ie oVirt complained
that there was no active pool). As soon as I brought the down node back
up everything was good again.

So my question is: What did I do wrong with my config?

Sound I, for example, change the EC Profile to `k=2, m=1`, but how is
that practically different from `k=4, m=2` - yes, the later spreads the
pool over more disks, but it should still only put 2 disks on each node,
shouldn't it?

Thanks in advance

Cheers

Dulux-Oz
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux