Re: best practices for EC pools

Eugen Block <eblock@xxxxxx> · Thu, 07 Feb 2019 16:47:22 +0000

Hi Francois,

Is that correct that recovery will be forbidden by the crush rule if  
a node is down?

yes, that is correct, failure-domain=host means no two chunks of the  
same PG can be on the same host. So if your PG is divided into 6  
chunks, they're all on different hosts, no recovery is possible at  
this point (for the EC-pool).

After rebooting all nodes we noticed that the recovery was slow,  
maybe half an hour, but all pools are currently empty (new install).
This is odd...

If the pools are empty I also wouldn't expect that, is restarting one  
OSD also that slow or is it just when you reboot the whole cluster?

Which k&m values are preferred on 6 nodes?

It depends on the failures you expect and how many concurrent failures  
you need to cover.
I think I would keep failure-domain=host (with only 4 OSDs per host).  
As for the k and m values, 3+2 would make sense, I guess. That profile  
would leave one host for recovery and two OSDs of one PG acting set  
could fail without data loss, so as resilient as the 4+2 profile. This  
is one approach, so please don't read this as *the* solution for your  
environment.

Regards,
Eugen

Zitat von Scheurer François <francois.scheurer@xxxxxxxxxxxx>:

Dear All

We created an erasure coded pool with k=4 m=2 with  
failure-domain=host but have only 6 osd nodes.
Is that correct that recovery will be forbidden by the crush rule if  
a node is down?

After rebooting all nodes we noticed that the recovery was slow,  
maybe half an hour, but all pools are currently empty (new install).
This is odd...

Can it be related to the k+m being equal to the number of nodes? (4+2=6)
step set_choose_tries 100 was already in the EC crush rule.

rule ewos1-prod_cinder_ec {
	id 2
	type erasure
	min_size 3
	max_size 6
	step set_chooseleaf_tries 5
	step set_choose_tries 100
	step take default class nvme
	step chooseleaf indep 0 type host
	step emit
}

ceph osd erasure-code-profile set ec42 k=4 m=2 crush-root=default  
crush-failure-domain=host crush-device-class=nvme
ceph osd pool create ewos1-prod_cinder_ec 256 256 erasure ec42

ceph version 12.2.10-543-gfc6f0c7299  
(fc6f0c7299e3442e8a0ab83260849a6249ce7b5f) luminous (stable)

  cluster:
    id:     b5e30221-a214-353c-b66b-8c37b4349123
    health: HEALTH_WARN
            noout flag(s) set
            Reduced data availability: 125 pgs inactive, 32 pgs peering

  services:
    mon: 3 daemons, quorum ewos1-osd1-prod,ewos1-osd3-prod,ewos1-osd5-prod
    mgr: ewos1-osd5-prod(active), standbys: ewos1-osd3-prod, ewos1-osd1-prod
    osd: 24 osds: 24 up, 24 in
         flags noout

  data:
    pools:   4 pools, 1600 pgs
    objects: 0 objects, 0B
    usage:   24.3GiB used, 43.6TiB / 43.7TiB avail
    pgs:     7.812% pgs not active
             1475 active+clean
             93   activating
             32   peering

Which k&m values are preferred on 6 nodes?
BTW, we plan to use this EC pool as a second rbd pool in Openstack,  
with the main first rbd pool being replicated size=3; it is nvme ssd  
only.

Thanks for your help!

Best Regards
Francois Scheurer

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com