Re: poor data distribution

Dominik Mostowiec <dominikmostowiec@xxxxxxxxx> · Sun, 2 Feb 2014 07:46:50 +0100

Hi,
Hmm,
You think about sumarize PGs from different pools on one OSD's i think.
But for one pool (.rgw.buckets) where i have almost of all my data, PG
count on OSDs is aslo different.
For example 105 vs 144 PGs from pool .rgw.buckets. In first case it is
52% disk usage, second 74%.

--
Regards
Dominik

2014-02-02 Sage Weil <sage@xxxxxxxxxxx>:
> It occurs to me that this (and other unexplain variance reports) could
> easily be the 'hashpspool' flag not being set.  The old behavior had the
> misfeature where consecutive pool's pg's would 'line up' on the same osds,
> so that 1.7 == 2.6 == 3.5 == 4.4 etc would map to the same nodes.  This
> tends to 'amplify' any variance in the placement.  The default is still to
> use the old behavior for compatibility (this will finally change in
> firefly).
>
> You can do
>
>  ceph osd pool set <poolname> hashpspool true
>
> to enable the new placement logic on an existing pool, but be warned that
> this will rebalance *all* of the data in the pool, which can be a very
> heavyweight operation...
>
> sage
>
>
> On Sun, 2 Feb 2014, Dominik Mostowiec wrote:
>
>> Hi,
>> After scrubbing almost all PGs has equal(~) num of objects.
>> I found something else.
>> On one host PG coun on OSDs:
>> OSD with small(52%) disk usage:
>> count, pool
>>     105 3
>>      18 4
>>       3 5
>>
>> Osd with larger(74%) disk usage:
>>     144 3
>>      31 4
>>       2 5
>>
>> Pool 3 is .rgw.buckets (where is almost of all data).
>> Pool 4 is .log, where is no data.
>>
>> Count of PGs shouldn't be the same per OSD ?
>> Or maybe PG hash algorithm is disrupted by wrong count of PG for pool
>> '4'. There is 1440 PGs ( this is not power of 2 ).
>>
>> ceph osd dump:
>> pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 28459 owner 0
>> crash_replay_interval 45
>> pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 28460 owner 0
>> pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash
>> rjenkins pg_num 64 pgp_num 64 last_change 28461 owner 0
>> pool 3 '.rgw.buckets' rep size 3 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 8192 pgp_num 8192 last_change 73711 owner
>> 0
>> pool 4 '.log' rep size 3 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 1440 pgp_num 1440 last_change 28463 owner 0
>> pool 5 '.rgw' rep size 3 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 128 pgp_num 128 last_change 72467 owner 0
>> pool 6 '.users.uid' rep size 3 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 28465 owner 0
>> pool 7 '.users' rep size 3 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 28466 owner 0
>> pool 8 '.usage' rep size 2 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 28467 owner
>> 18446744073709551615
>> pool 9 '.intent-log' rep size 3 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 28468 owner
>> 18446744073709551615
>> pool 10 '.rgw.control' rep size 3 min_size 1 crush_ruleset 0
>> object_hash rjenkins pg_num 8 pgp_num 8 last_change 33485 owner
>> 18446744073709551615
>> pool 11 '.rgw.gc' rep size 3 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 33487 owner
>> 18446744073709551615
>> pool 12 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash
>> rjenkins pg_num 8 pgp_num 8 last_change 44540 owner 0
>> pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
>> pg_num 8 pgp_num 8 last_change 46912 owner 0
>>
>> --
>> Regards
>> Dominik
>>
>> 2014-02-01 Dominik Mostowiec <dominikmostowiec@xxxxxxxxx>:
>> > Hi,
>> >> Did you bump pgp_num as well?
>> > Yes.
>> >
>> > See: http://dysk.onet.pl/link/BZ968
>> >
>> >> 25% pools is two times smaller from other.
>> > This is changing after scrubbing.
>> >
>> > --
>> > Regards
>> > Dominik
>> >
>> > 2014-02-01 Kyle Bader <kyle.bader@xxxxxxxxx>:
>> >>
>> >>> Change pg_num for .rgw.buckets to power of 2, an 'crush tunables
>> >>> optimal' didn't help :(
>> >>
>> >> Did you bump pgp_num as well? The split pgs will stay in place until pgp_num
>> >> is bumped as well, if you do this be prepared for (potentially lots) of data
>> >> movement.
>> >
>> >
>> >
>> > --
>> > Pozdrawiam
>> > Dominik
>>
>>
>>
>> --
>> Pozdrawiam
>> Dominik
>>
>>

-- 
Pozdrawiam
Dominik
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com