Hi, Hmm, You think about sumarize PGs from different pools on one OSD's i think. But for one pool (.rgw.buckets) where i have almost of all my data, PG count on OSDs is aslo different. For example 105 vs 144 PGs from pool .rgw.buckets. In first case it is 52% disk usage, second 74%. -- Regards Dominik 2014-02-02 Sage Weil <sage@xxxxxxxxxxx>: > It occurs to me that this (and other unexplain variance reports) could > easily be the 'hashpspool' flag not being set. The old behavior had the > misfeature where consecutive pool's pg's would 'line up' on the same osds, > so that 1.7 == 2.6 == 3.5 == 4.4 etc would map to the same nodes. This > tends to 'amplify' any variance in the placement. The default is still to > use the old behavior for compatibility (this will finally change in > firefly). > > You can do > > ceph osd pool set <poolname> hashpspool true > > to enable the new placement logic on an existing pool, but be warned that > this will rebalance *all* of the data in the pool, which can be a very > heavyweight operation... > > sage > > > On Sun, 2 Feb 2014, Dominik Mostowiec wrote: > >> Hi, >> After scrubbing almost all PGs has equal(~) num of objects. >> I found something else. >> On one host PG coun on OSDs: >> OSD with small(52%) disk usage: >> count, pool >> 105 3 >> 18 4 >> 3 5 >> >> Osd with larger(74%) disk usage: >> 144 3 >> 31 4 >> 2 5 >> >> Pool 3 is .rgw.buckets (where is almost of all data). >> Pool 4 is .log, where is no data. >> >> Count of PGs shouldn't be the same per OSD ? >> Or maybe PG hash algorithm is disrupted by wrong count of PG for pool >> '4'. There is 1440 PGs ( this is not power of 2 ). >> >> ceph osd dump: >> pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash >> rjenkins pg_num 64 pgp_num 64 last_change 28459 owner 0 >> crash_replay_interval 45 >> pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash >> rjenkins pg_num 64 pgp_num 64 last_change 28460 owner 0 >> pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash >> rjenkins pg_num 64 pgp_num 64 last_change 28461 owner 0 >> pool 3 '.rgw.buckets' rep size 3 min_size 1 crush_ruleset 0 >> object_hash rjenkins pg_num 8192 pgp_num 8192 last_change 73711 owner >> 0 >> pool 4 '.log' rep size 3 min_size 1 crush_ruleset 0 object_hash >> rjenkins pg_num 1440 pgp_num 1440 last_change 28463 owner 0 >> pool 5 '.rgw' rep size 3 min_size 1 crush_ruleset 0 object_hash >> rjenkins pg_num 128 pgp_num 128 last_change 72467 owner 0 >> pool 6 '.users.uid' rep size 3 min_size 1 crush_ruleset 0 object_hash >> rjenkins pg_num 8 pgp_num 8 last_change 28465 owner 0 >> pool 7 '.users' rep size 3 min_size 1 crush_ruleset 0 object_hash >> rjenkins pg_num 8 pgp_num 8 last_change 28466 owner 0 >> pool 8 '.usage' rep size 2 min_size 1 crush_ruleset 0 object_hash >> rjenkins pg_num 8 pgp_num 8 last_change 28467 owner >> 18446744073709551615 >> pool 9 '.intent-log' rep size 3 min_size 1 crush_ruleset 0 object_hash >> rjenkins pg_num 8 pgp_num 8 last_change 28468 owner >> 18446744073709551615 >> pool 10 '.rgw.control' rep size 3 min_size 1 crush_ruleset 0 >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 33485 owner >> 18446744073709551615 >> pool 11 '.rgw.gc' rep size 3 min_size 1 crush_ruleset 0 object_hash >> rjenkins pg_num 8 pgp_num 8 last_change 33487 owner >> 18446744073709551615 >> pool 12 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash >> rjenkins pg_num 8 pgp_num 8 last_change 44540 owner 0 >> pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins >> pg_num 8 pgp_num 8 last_change 46912 owner 0 >> >> -- >> Regards >> Dominik >> >> 2014-02-01 Dominik Mostowiec <dominikmostowiec@xxxxxxxxx>: >> > Hi, >> >> Did you bump pgp_num as well? >> > Yes. >> > >> > See: http://dysk.onet.pl/link/BZ968 >> > >> >> 25% pools is two times smaller from other. >> > This is changing after scrubbing. >> > >> > -- >> > Regards >> > Dominik >> > >> > 2014-02-01 Kyle Bader <kyle.bader@xxxxxxxxx>: >> >> >> >>> Change pg_num for .rgw.buckets to power of 2, an 'crush tunables >> >>> optimal' didn't help :( >> >> >> >> Did you bump pgp_num as well? The split pgs will stay in place until pgp_num >> >> is bumped as well, if you do this be prepared for (potentially lots) of data >> >> movement. >> > >> > >> > >> > -- >> > Pozdrawiam >> > Dominik >> >> >> >> -- >> Pozdrawiam >> Dominik >> >> -- Pozdrawiam Dominik _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com