Re: poor data distribution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It occurs to me that this (and other unexplain variance reports) could 
easily be the 'hashpspool' flag not being set.  The old behavior had the 
misfeature where consecutive pool's pg's would 'line up' on the same osds, 
so that 1.7 == 2.6 == 3.5 == 4.4 etc would map to the same nodes.  This 
tends to 'amplify' any variance in the placement.  The default is still to 
use the old behavior for compatibility (this will finally change in 
firefly).

You can do

 ceph osd pool set <poolname> hashpspool true

to enable the new placement logic on an existing pool, but be warned that 
this will rebalance *all* of the data in the pool, which can be a very 
heavyweight operation...

sage


On Sun, 2 Feb 2014, Dominik Mostowiec wrote:

> Hi,
> After scrubbing almost all PGs has equal(~) num of objects.
> I found something else.
> On one host PG coun on OSDs:
> OSD with small(52%) disk usage:
> count, pool
>     105 3
>      18 4
>       3 5
> 
> Osd with larger(74%) disk usage:
>     144 3
>      31 4
>       2 5
> 
> Pool 3 is .rgw.buckets (where is almost of all data).
> Pool 4 is .log, where is no data.
> 
> Count of PGs shouldn't be the same per OSD ?
> Or maybe PG hash algorithm is disrupted by wrong count of PG for pool
> '4'. There is 1440 PGs ( this is not power of 2 ).
> 
> ceph osd dump:
> pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 28459 owner 0
> crash_replay_interval 45
> pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 28460 owner 0
> pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 28461 owner 0
> pool 3 '.rgw.buckets' rep size 3 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8192 pgp_num 8192 last_change 73711 owner
> 0
> pool 4 '.log' rep size 3 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 1440 pgp_num 1440 last_change 28463 owner 0
> pool 5 '.rgw' rep size 3 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 128 pgp_num 128 last_change 72467 owner 0
> pool 6 '.users.uid' rep size 3 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 28465 owner 0
> pool 7 '.users' rep size 3 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 28466 owner 0
> pool 8 '.usage' rep size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 28467 owner
> 18446744073709551615
> pool 9 '.intent-log' rep size 3 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 28468 owner
> 18446744073709551615
> pool 10 '.rgw.control' rep size 3 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 33485 owner
> 18446744073709551615
> pool 11 '.rgw.gc' rep size 3 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 33487 owner
> 18446744073709551615
> pool 12 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 44540 owner 0
> pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
> pg_num 8 pgp_num 8 last_change 46912 owner 0
> 
> --
> Regards
> Dominik
> 
> 2014-02-01 Dominik Mostowiec <dominikmostowiec@xxxxxxxxx>:
> > Hi,
> >> Did you bump pgp_num as well?
> > Yes.
> >
> > See: http://dysk.onet.pl/link/BZ968
> >
> >> 25% pools is two times smaller from other.
> > This is changing after scrubbing.
> >
> > --
> > Regards
> > Dominik
> >
> > 2014-02-01 Kyle Bader <kyle.bader@xxxxxxxxx>:
> >>
> >>> Change pg_num for .rgw.buckets to power of 2, an 'crush tunables
> >>> optimal' didn't help :(
> >>
> >> Did you bump pgp_num as well? The split pgs will stay in place until pgp_num
> >> is bumped as well, if you do this be prepared for (potentially lots) of data
> >> movement.
> >
> >
> >
> > --
> > Pozdrawiam
> > Dominik
> 
> 
> 
> -- 
> Pozdrawiam
> Dominik
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux