Thanks so much Craig, this was really helpful and now works as expected! Have a nice day, Fabrizio On 3 May 2014 01:53, Craig Lewis <clewis at centraldesktop.com> wrote: > On 5/2/14 05:15 , Fabrizio G. Ventola wrote: > > Hello everybody, > I'm making some tests with ceph and its editable cluster map and I'm > trying to define a "rack" layer for its hierarchy in this way: > > ceph osd tree: > > # id weight type name up/down reweight > -1 0.84 root default > -7 0.28 rack rack1 > -2 0.14 host cephosd1-dev > 0 0.14 osd.0 up 1 > -3 0.14 host cephosd2-dev > 1 0.14 osd.1 up 1 > -8 0.28 rack rack2 > -4 0.14 host cephosd3-dev > 2 0.14 osd.2 up 1 > -5 0.14 host cephosd4-dev > 3 0.14 osd.3 up 1 > -9 0.28 rack rack3 > -6 0.28 host cephosd5-dev > 4 0.28 osd.4 up 1 > > Those are my pools: > pool 0 'data' rep size 3 min_size 2 crush_ruleset 0 object_hash > rjenkins pg_num 333 pgp_num 333 last_change 2545 owner 0 > crash_replay_interval 45 > pool 1 'metadata' rep size 3 min_size 2 crush_ruleset 1 object_hash > rjenkins pg_num 333 pgp_num 333 last_change 2548 owner 0 > pool 2 'rbd' rep size 3 min_size 2 crush_ruleset 2 object_hash > rjenkins pg_num 333 pgp_num 333 last_change 2529 owner 0 > pool 4 'pool_01' rep size 3 min_size 2 crush_ruleset 0 object_hash > rjenkins pg_num 333 pgp_num 333 last_change 2542 owner 0 > > I configured replica 3 for all pools and min_size 2, thus I'm > expecting when I write new data on ceph-fs (through FUSE) or when I > make a new RBD to see the same amount of data on every rack (3 racks, > 3 replicas -> 1 replica per rack). But as you can see the third rack > has just one OSD (the first two have two by the way) and should have > the rack1+rack2 amount of data. Instead it has less data than the > other racks (but more than one single OSD of the first two racks). > Where am I wrong? > > Thank you in advance, > Fabrizio > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > You also need to edit the crush rules to tell it to choose a leaf from each > rack, instead of the default host. If you run > ceph osd crush dump > > You'll see that the rules 0, 1, and 2 are operation chooseleaf_firstn, type > host. Those rule numbers are referenced in the pool data's crush_ruleset > above. > > > This should get you started on editing the crush map: > https://ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map > > In the rules section of the decompiled map, change your > step chooseleaf firstn 0 type host > to > step chooseleaf firstn 0 type rack > > > Then compile and set the new crushmap. > > A lot of data is going to start moving. This will give you a chance to use > your cluster during a heavy recovery operation. > > > -- > > Craig Lewis > Senior Systems Engineer > Office +1.714.602.1309 > Email clewis at centraldesktop.com > > Central Desktop. Work together in ways you never thought possible. > Connect with us Website | Twitter | Facebook | LinkedIn | Blog > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >