Re: Erasure Code Setup

Loic Dachary <loic@xxxxxxxxxxx> · Mon, 24 Mar 2014 22:32:18 +0100

On 24/03/2014 21:37, Gruher, Joseph R wrote:> In the IRC chat dmick helped me confirm the commands of the form "ceph osd erasure-code-profile" are only in the master branch and are not in 0.78 (thanks dmick) so let me revise my query a bit:
> 
>  
> 
> 1.       Does it seem likely the problem described below is due to the failure domains and the solution is to change the failure domains to OSDs instead of the default of hosts?

Yes.

> 
>  
> 
> 2.       If so, how would you make such a change for an erasure code pool / ruleset, in the 0.78 branch?
> 

params="erasure-code-k=9 erasure-code-m=3 erasure-code-ruleset-failure-domain=osd"
ceph osd crush rule create-erasure ecruleset $params
ceph osd pool create ecpool 12 12 erasure crush_ruleset=ecruleset $params

I hope that helps :-)

>  
> 
> Thanks!
> 
>  
> 
> -Joe
> 
>  
> 
> *From:* Gruher, Joseph R
> *Sent:* Monday, March 24, 2014 1:01 PM
> *To:* ceph-users@xxxxxxxxxxxxxx
> *Cc:* Gruher, Joseph R
> *Subject:* Erasure Code Setup
> 
>  
> 
> Hi Folks-
> 
>  
> 
> Having a bit of trouble with EC setup on 0.78.  Hoping someone can help me out.  I’ve got most of the pieces in place, I think I’m just having a problem with the ruleset.
> 
>  
> 
> I am running 0.78:
> 
> ceph --version
> 
> ceph version 0.78 (f6c746c314d7b87b8419b6e584c94bfe4511dbd4)
> 
>  
> 
> I created a new ruleset:
> 
> ceph osd crush rule create-erasure ecruleset
> 
>  
> 
> Then I created a new erasure code pool:
> 
> ceph osd pool create mycontainers_1 1800 1800 erasure crush_ruleset=ecruleset erasure-code-k=9 erasure-code-m=3
> 
>  
> 
> Pool exists:
> 
> ceph@joceph-admin01:/etc/ceph$ ceph osd dump
> 
> epoch 106
> 
> fsid b12ebb71-e4a6-41fa-8246-71cbfa09fb6e
> 
> created 2014-03-24 12:06:28.290970
> 
> modified 2014-03-24 12:42:59.231381
> 
> flags
> 
> pool 0 'data' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 84 owner 0 flags hashpspool crash_replay_interval 45 stripe_width 0
> 
> pool 1 'metadata' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 86 owner 0 flags hashpspool stripe_width 0
> 
> pool 2 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 88 owner 0 flags hashpspool stripe_width 0
> 
> pool 4 'mycontainers_2' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 1200 pgp_num 1200 last_change 100 owner 0 flags hashpspool stripe_width 0
> 
> pool 5 'mycontainers_3' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1800 pgp_num 1800 last_change 94 owner 0 flags hashpspool stripe_width 0
> 
> pool 6 'mycontainers_1' erasure size 12 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 1800 pgp_num 1800 last_change 104 owner 0 flags hashpspool stripe_width 4320
> 
>  
> 
> However, the new PGs won’t come to a healthy state:
> 
> ceph@joceph-admin01:/etc/ceph$ ceph status
> 
>     cluster b12ebb71-e4a6-41fa-8246-71cbfa09fb6e
> 
>      health HEALTH_WARN 1800 pgs incomplete; 1800 pgs stuck inactive; 1800 pgs stuck unclean
> 
>      monmap e1: 2 mons at {mohonpeak01=10.0.0.101:6789/0,mohonpeak02=10.0.0.102:6789/0}, election epoch 4, quorum 0,1 mohonpeak01,mohonpeak02
> 
>      osdmap e106: 18 osds: 18 up, 18 in
> 
>       pgmap v261: 5184 pgs, 7 pools, 0 bytes data, 0 objects
> 
>             682 MB used, 15082 GB / 15083 GB avail
> 
>                 3384 active+clean
> 
>                 1800 incomplete
> 
>  
> 
> I think this is because it is using a failure domain of hosts and I only have 2 hosts (with 9 OSDs on each for 18 OSDs total).  I suspect I need to change the ruleset to use a failure domain of OSD instead of host.  This is also mentioned on this page: https://ceph.com/docs/master/dev/erasure-coded-pool/.
> 
>  
> 
> However, the guidance on that that page to adjust it using commands of the form “ceph osd erasure-code-profile set myprofile” is not working for me.  As far as I can tell “ceph osd erasure-code-profile” does not seem to be a valid command syntax.  Is this documentation correct and up to date for 0.78?  Can anyone suggest where I am going wrong?  Thanks!
> 
>  
> 
> ceph@joceph-admin01:/etc/ceph$ ceph osd erasure-code-profile ls
> 
> no valid command found; 10 closest matches:
> 
> osd tier add-cache <poolname> <poolname> <int[0-]>
> 
> osd tier set-overlay <poolname> <poolname>
> 
> osd tier remove-overlay <poolname>
> 
> osd tier remove <poolname> <poolname>
> 
> osd tier cache-mode <poolname> none|writeback|forward|readonly
> 
> osd thrash <int[0-]>
> 
> osd tier add <poolname> <poolname> {--force-nonempty}
> 
> osd stat
> 
> osd reweight-by-utilization {<int[100-]>}
> 
> osd pool stats {<name>}
> 
> Error EINVAL: invalid command
> 
> ceph@joceph-admin01:/etc/ceph$
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com