ceph crush map rules for EC pools and out OSDs ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

I have 5 data nodes (bluestore, kraken), each with 24 OSDs.

I enabled the optimal crush tunables.

I’d like to try to “really” use EC pools, but until now I’ve faced cluster lockups when I was using 3+2 EC pools with a host failure domain.

When a host was down for instance ;)

 

Since I’d like the erasure codes to be more than a “nice to have feature with 12+ ceph data nodes”, I wanted to try this :

 

-          Use a 14+6 EC rule

-          And for each data chunk:

o    select 4 hosts

o   On these hosts, select 5 OSDs

 

In order to do that, I created this rule in the crush map :

 

rule 4hosts_20shards {

        ruleset 3

        type erasure

        min_size 20

        max_size 20

        step set_chooseleaf_tries 5

        step set_choose_tries 100

        step take default

        step choose indep 4 type host

        step chooseleaf indep 5 type osd

        step emit

}

 

I then created an EC pool with this erasure profile :

ceph osd erasure-code-profile set erasurep14_6_osd  ruleset-failure-domain=osd k=14 m=6

 

I hoped this would allow for loosing 1 host completely  without locking the cluster, and I have the impression this is working..

But. There’s always a but ;)

 

I tried to make all OSDs down by stopping the ceph-osd daemons on one node.

And according to ceph, the cluster is unhealthy.

The ceph health detail fives me for instance this (for the 3+2 and 14+6 pools) :

 

pg 5.18b is active+undersized+degraded, acting [57,47,2147483647,23,133]

pg 9.186 is active+undersized+degraded, acting [2147483647,2147483647,2147483647,2147483647,2147483647,133,142,125,131,137,50,48,55,65,52,16,13,18,22,3]

 

My question therefore is : why aren’t the down PGs remapped onto my 5th data node since I made sure the 20 EC shards were spread onto 4 hosts only ?

I thought/hoped that because osds were down, the data would be rebuilt onto another OSD/host ?

I can understand the 3+2 EC pool cannot allocate OSDs on another host because the 3+2=5 hosts already, but I don’t understand why the 14+6 EC pool/pgs do not rebuild somewhere else ?

 

I do not find anything worth in a “ceph pg query”, the up and acting parts are equal and do contain the 2147483647 value (wich means none as far as I understood).

 

I’ve also tried to “ceph osd out” all the OSDs from one host : in that case, the 3+2 EC PGs behaves as previously, but the 14+6 EC PGs seem happy despite the fact they are still saying the out OSDs are up and acting.

Is my crush rule that wrong ?

Is it possible to do what I want ?

 

Thanks for any hints…

 

Regards

Frederic

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux