Re: best practices for EC pools

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Francois,

Is that correct that recovery will be forbidden by the crush rule if a node is down?

yes, that is correct, failure-domain=host means no two chunks of the same PG can be on the same host. So if your PG is divided into 6 chunks, they're all on different hosts, no recovery is possible at this point (for the EC-pool).

After rebooting all nodes we noticed that the recovery was slow, maybe half an hour, but all pools are currently empty (new install).
This is odd...

If the pools are empty I also wouldn't expect that, is restarting one OSD also that slow or is it just when you reboot the whole cluster?

Which k&m values are preferred on 6 nodes?

It depends on the failures you expect and how many concurrent failures you need to cover. I think I would keep failure-domain=host (with only 4 OSDs per host). As for the k and m values, 3+2 would make sense, I guess. That profile would leave one host for recovery and two OSDs of one PG acting set could fail without data loss, so as resilient as the 4+2 profile. This is one approach, so please don't read this as *the* solution for your environment.

Regards,
Eugen


Zitat von Scheurer François <francois.scheurer@xxxxxxxxxxxx>:

Dear All


We created an erasure coded pool with k=4 m=2 with failure-domain=host but have only 6 osd nodes. Is that correct that recovery will be forbidden by the crush rule if a node is down?

After rebooting all nodes we noticed that the recovery was slow, maybe half an hour, but all pools are currently empty (new install).
This is odd...

Can it be related to the k+m being equal to the number of nodes? (4+2=6)
step set_choose_tries 100 was already in the EC crush rule.

rule ewos1-prod_cinder_ec {
	id 2
	type erasure
	min_size 3
	max_size 6
	step set_chooseleaf_tries 5
	step set_choose_tries 100
	step take default class nvme
	step chooseleaf indep 0 type host
	step emit
}

ceph osd erasure-code-profile set ec42 k=4 m=2 crush-root=default crush-failure-domain=host crush-device-class=nvme
ceph osd pool create ewos1-prod_cinder_ec 256 256 erasure ec42

ceph version 12.2.10-543-gfc6f0c7299 (fc6f0c7299e3442e8a0ab83260849a6249ce7b5f) luminous (stable)

  cluster:
    id:     b5e30221-a214-353c-b66b-8c37b4349123
    health: HEALTH_WARN
            noout flag(s) set
            Reduced data availability: 125 pgs inactive, 32 pgs peering

  services:
    mon: 3 daemons, quorum ewos1-osd1-prod,ewos1-osd3-prod,ewos1-osd5-prod
    mgr: ewos1-osd5-prod(active), standbys: ewos1-osd3-prod, ewos1-osd1-prod
    osd: 24 osds: 24 up, 24 in
         flags noout

  data:
    pools:   4 pools, 1600 pgs
    objects: 0 objects, 0B
    usage:   24.3GiB used, 43.6TiB / 43.7TiB avail
    pgs:     7.812% pgs not active
             1475 active+clean
             93   activating
             32   peering


Which k&m values are preferred on 6 nodes?
BTW, we plan to use this EC pool as a second rbd pool in Openstack, with the main first rbd pool being replicated size=3; it is nvme ssd only.


Thanks for your help!



Best Regards
Francois Scheurer



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux