ceph editable failure domains

fabrizio.ventola@xxxxxxxx (Fabrizio G. Ventola) · Mon, 5 May 2014 13:14:42 +0200

Thanks so much Craig, this was really helpful and now works as expected!

Have a nice day,

Fabrizio

On 3 May 2014 01:53, Craig Lewis <clewis at centraldesktop.com> wrote:
> On 5/2/14 05:15 , Fabrizio G. Ventola wrote:
>
> Hello everybody,
> I'm making some tests with ceph and its editable cluster map and I'm
> trying to define a "rack" layer for its hierarchy in this way:
>
> ceph osd tree:
>
> # id weight type name up/down reweight
> -1 0.84 root default
> -7 0.28 rack rack1
> -2 0.14 host cephosd1-dev
> 0 0.14 osd.0 up 1
> -3 0.14 host cephosd2-dev
> 1 0.14 osd.1 up 1
> -8 0.28 rack rack2
> -4 0.14 host cephosd3-dev
> 2 0.14 osd.2 up 1
> -5 0.14 host cephosd4-dev
> 3 0.14 osd.3 up 1
> -9 0.28 rack rack3
> -6 0.28 host cephosd5-dev
> 4 0.28 osd.4 up 1
>
> Those are my pools:
> pool 0 'data' rep size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 333 pgp_num 333 last_change 2545 owner 0
> crash_replay_interval 45
> pool 1 'metadata' rep size 3 min_size 2 crush_ruleset 1 object_hash
> rjenkins pg_num 333 pgp_num 333 last_change 2548 owner 0
> pool 2 'rbd' rep size 3 min_size 2 crush_ruleset 2 object_hash
> rjenkins pg_num 333 pgp_num 333 last_change 2529 owner 0
> pool 4 'pool_01' rep size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 333 pgp_num 333 last_change 2542 owner 0
>
> I configured replica 3 for all pools and min_size 2, thus I'm
> expecting when I write new data on ceph-fs (through FUSE) or when I
> make a new RBD to see the same amount of data on every rack (3 racks,
> 3 replicas -> 1 replica per rack). But as you can see the third rack
> has just one OSD (the first two have two by the way) and should have
> the rack1+rack2 amount of data. Instead it has less data than the
> other racks (but more than one single OSD of the first two racks).
> Where am I wrong?
>
> Thank you in advance,
> Fabrizio
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> You also need to edit the crush rules to tell it to choose a leaf from each
> rack, instead of the default host.  If you run
> ceph osd crush dump
>
> You'll see that the rules 0, 1, and 2 are operation chooseleaf_firstn, type
> host.  Those rule numbers are referenced in the pool data's crush_ruleset
> above.
>
>
> This should get you started on editing the crush map:
> https://ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map
>
> In the rules section of the decompiled map, change your
>         step chooseleaf firstn 0 type host
> to
>         step chooseleaf firstn 0 type rack
>
>
> Then compile and set the new crushmap.
>
> A lot of data is going to start moving.  This will give you a chance to use
> your cluster during a heavy recovery operation.
>
>
> --
>
> Craig Lewis
> Senior Systems Engineer
> Office +1.714.602.1309
> Email clewis at centraldesktop.com
>
> Central Desktop. Work together in ways you never thought possible.
> Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>