Just to add, that a more general formula is that the number of nodes should be greater than or equal to k+m+m so N>=k+m+m for full recovery -----Original Message----- From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Eugen Block Sent: Thursday, February 7, 2019 8:47 AM To: ceph-users@xxxxxxxxxxxxxx Subject: Re: best practices for EC pools Hi Francois, > Is that correct that recovery will be forbidden by the crush rule if a > node is down? yes, that is correct, failure-domain=host means no two chunks of the same PG can be on the same host. So if your PG is divided into 6 chunks, they're all on different hosts, no recovery is possible at this point (for the EC-pool). > After rebooting all nodes we noticed that the recovery was slow, maybe > half an hour, but all pools are currently empty (new install). > This is odd... If the pools are empty I also wouldn't expect that, is restarting one OSD also that slow or is it just when you reboot the whole cluster? > Which k&m values are preferred on 6 nodes? It depends on the failures you expect and how many concurrent failures you need to cover. I think I would keep failure-domain=host (with only 4 OSDs per host). As for the k and m values, 3+2 would make sense, I guess. That profile would leave one host for recovery and two OSDs of one PG acting set could fail without data loss, so as resilient as the 4+2 profile. This is one approach, so please don't read this as *the* solution for your environment. Regards, Eugen Zitat von Scheurer François <francois.scheurer@xxxxxxxxxxxx>: > Dear All > > > We created an erasure coded pool with k=4 m=2 with failure-domain=host > but have only 6 osd nodes. > Is that correct that recovery will be forbidden by the crush rule if a > node is down? > > After rebooting all nodes we noticed that the recovery was slow, maybe > half an hour, but all pools are currently empty (new install). > This is odd... > > Can it be related to the k+m being equal to the number of nodes? > (4+2=6) step set_choose_tries 100 was already in the EC crush rule. > > rule ewos1-prod_cinder_ec { > id 2 > type erasure > min_size 3 > max_size 6 > step set_chooseleaf_tries 5 > step set_choose_tries 100 > step take default class nvme > step chooseleaf indep 0 type host > step emit > } > > ceph osd erasure-code-profile set ec42 k=4 m=2 crush-root=default > crush-failure-domain=host crush-device-class=nvme ceph osd pool create > ewos1-prod_cinder_ec 256 256 erasure ec42 > > ceph version 12.2.10-543-gfc6f0c7299 > (fc6f0c7299e3442e8a0ab83260849a6249ce7b5f) luminous (stable) > > cluster: > id: b5e30221-a214-353c-b66b-8c37b4349123 > health: HEALTH_WARN > noout flag(s) set > Reduced data availability: 125 pgs inactive, 32 pgs > peering > > services: > mon: 3 daemons, quorum ewos1-osd1-prod,ewos1-osd3-prod,ewos1-osd5-prod > mgr: ewos1-osd5-prod(active), standbys: ewos1-osd3-prod, ewos1-osd1-prod > osd: 24 osds: 24 up, 24 in > flags noout > > data: > pools: 4 pools, 1600 pgs > objects: 0 objects, 0B > usage: 24.3GiB used, 43.6TiB / 43.7TiB avail > pgs: 7.812% pgs not active > 1475 active+clean > 93 activating > 32 peering > > > Which k&m values are preferred on 6 nodes? > BTW, we plan to use this EC pool as a second rbd pool in Openstack, > with the main first rbd pool being replicated size=3; it is nvme ssd > only. > > > Thanks for your help! > > > > Best Regards > Francois Scheurer _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=DwIGaQ&c=4DxX-JX0i28X6V65hK0ftwVK1xnmwcYC0vo7GVya1JY&r=sgFiQgvQASiGFaHpitF5P9M9QDCRkgKGttwwMFt2VIU&m=pTchIHDm3u6d1bmWBYKGF0Akb9UelYSeP1pnEbEw85Q&s=FV0ocIQ2LDiwIdGtKE36tH50px_KHyRvz14eDP1qptI&e= _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com