It occurs to me that this (and other unexplain variance reports) could easily be the 'hashpspool' flag not being set. The old behavior had the misfeature where consecutive pool's pg's would 'line up' on the same osds, so that 1.7 == 2.6 == 3.5 == 4.4 etc would map to the same nodes. This tends to 'amplify' any variance in the placement. The default is still to use the old behavior for compatibility (this will finally change in firefly). You can do ceph osd pool set <poolname> hashpspool true to enable the new placement logic on an existing pool, but be warned that this will rebalance *all* of the data in the pool, which can be a very heavyweight operation... sage On Sun, 2 Feb 2014, Dominik Mostowiec wrote: > Hi, > After scrubbing almost all PGs has equal(~) num of objects. > I found something else. > On one host PG coun on OSDs: > OSD with small(52%) disk usage: > count, pool > 105 3 > 18 4 > 3 5 > > Osd with larger(74%) disk usage: > 144 3 > 31 4 > 2 5 > > Pool 3 is .rgw.buckets (where is almost of all data). > Pool 4 is .log, where is no data. > > Count of PGs shouldn't be the same per OSD ? > Or maybe PG hash algorithm is disrupted by wrong count of PG for pool > '4'. There is 1440 PGs ( this is not power of 2 ). > > ceph osd dump: > pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 28459 owner 0 > crash_replay_interval 45 > pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 28460 owner 0 > pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash > rjenkins pg_num 64 pgp_num 64 last_change 28461 owner 0 > pool 3 '.rgw.buckets' rep size 3 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 8192 pgp_num 8192 last_change 73711 owner > 0 > pool 4 '.log' rep size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 1440 pgp_num 1440 last_change 28463 owner 0 > pool 5 '.rgw' rep size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 128 pgp_num 128 last_change 72467 owner 0 > pool 6 '.users.uid' rep size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 8 pgp_num 8 last_change 28465 owner 0 > pool 7 '.users' rep size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 8 pgp_num 8 last_change 28466 owner 0 > pool 8 '.usage' rep size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 8 pgp_num 8 last_change 28467 owner > 18446744073709551615 > pool 9 '.intent-log' rep size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 8 pgp_num 8 last_change 28468 owner > 18446744073709551615 > pool 10 '.rgw.control' rep size 3 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 33485 owner > 18446744073709551615 > pool 11 '.rgw.gc' rep size 3 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 8 pgp_num 8 last_change 33487 owner > 18446744073709551615 > pool 12 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 8 pgp_num 8 last_change 44540 owner 0 > pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins > pg_num 8 pgp_num 8 last_change 46912 owner 0 > > -- > Regards > Dominik > > 2014-02-01 Dominik Mostowiec <dominikmostowiec@xxxxxxxxx>: > > Hi, > >> Did you bump pgp_num as well? > > Yes. > > > > See: http://dysk.onet.pl/link/BZ968 > > > >> 25% pools is two times smaller from other. > > This is changing after scrubbing. > > > > -- > > Regards > > Dominik > > > > 2014-02-01 Kyle Bader <kyle.bader@xxxxxxxxx>: > >> > >>> Change pg_num for .rgw.buckets to power of 2, an 'crush tunables > >>> optimal' didn't help :( > >> > >> Did you bump pgp_num as well? The split pgs will stay in place until pgp_num > >> is bumped as well, if you do this be prepared for (potentially lots) of data > >> movement. > > > > > > > > -- > > Pozdrawiam > > Dominik > > > > -- > Pozdrawiam > Dominik > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com