On 5/2/14 05:15 , Fabrizio G. Ventola wrote: > Hello everybody, > I'm making some tests with ceph and its editable cluster map and I'm > trying to define a "rack" layer for its hierarchy in this way: > > ceph osd tree: > > # id weight type name up/down reweight > -1 0.84 root default > -7 0.28 rack rack1 > -2 0.14 host cephosd1-dev > 0 0.14 osd.0 up 1 > -3 0.14 host cephosd2-dev > 1 0.14 osd.1 up 1 > -8 0.28 rack rack2 > -4 0.14 host cephosd3-dev > 2 0.14 osd.2 up 1 > -5 0.14 host cephosd4-dev > 3 0.14 osd.3 up 1 > -9 0.28 rack rack3 > -6 0.28 host cephosd5-dev > 4 0.28 osd.4 up 1 > > Those are my pools: > pool 0 'data' rep size 3 min_size 2 crush_ruleset 0 object_hash > rjenkins pg_num 333 pgp_num 333 last_change 2545 owner 0 > crash_replay_interval 45 > pool 1 'metadata' rep size 3 min_size 2 crush_ruleset 1 object_hash > rjenkins pg_num 333 pgp_num 333 last_change 2548 owner 0 > pool 2 'rbd' rep size 3 min_size 2 crush_ruleset 2 object_hash > rjenkins pg_num 333 pgp_num 333 last_change 2529 owner 0 > pool 4 'pool_01' rep size 3 min_size 2 crush_ruleset 0 object_hash > rjenkins pg_num 333 pgp_num 333 last_change 2542 owner 0 > > I configured replica 3 for all pools and min_size 2, thus I'm > expecting when I write new data on ceph-fs (through FUSE) or when I > make a new RBD to see the same amount of data on every rack (3 racks, > 3 replicas -> 1 replica per rack). But as you can see the third rack > has just one OSD (the first two have two by the way) and should have > the rack1+rack2 amount of data. Instead it has less data than the > other racks (but more than one single OSD of the first two racks). > Where am I wrong? > > Thank you in advance, > Fabrizio > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com You also need to edit the crush rules to tell it to choose a leaf from each rack, instead of the default host. If you run ceph osd crush dump You'll see that the rules 0, 1, and 2 are operation chooseleaf_firstn, type host. Those rule numbers are referenced in the pool data's crush_ruleset above. This should get you started on editing the crush map: https://ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map In the rules section of the decompiled map, change your step chooseleaf firstn 0 type host to step chooseleaf firstn 0 type rack Then compile and set the new crushmap. A lot of data is going to start moving. This will give you a chance to use your cluster during a heavy recovery operation. -- *Craig Lewis* Senior Systems Engineer Office +1.714.602.1309 Email clewis at centraldesktop.com <mailto:clewis at centraldesktop.com> *Central Desktop. Work together in ways you never thought possible.* Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140502/946d7865/attachment.htm>