Re: Getting placement groups to place evenly (again)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 8, 2015 at 9:45 AM, J David <j.david.lists@xxxxxxxxx> wrote:
> On Wed, Apr 8, 2015 at 11:40 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> "ceph pg dump" will output the size of each pg, among other things.
>
> Among many other things. :)
>
> Here is the raw output, in case I'm misinterpreting it:
>
> http://pastebin.com/j4ySNBdQ
>
> It *looks* like the pg's are roughly uniform in size.  They range from
> 2.27GiB to 2.91GiB with an average of 2.58GiB and a standard deviation
> of only 0.1GiB, and it looks like about 95% are within two standard
> deviations.  The difference between the least used and most used OSDs
> is on the order of 100+GB.  A few placement groups being a few hundred
> megs bigger or smaller doesn't seem like it would account for that.
>
> Breaking out the placement groups per OSD was a bit trickier, but this
> seems to do the trick:
>
> egrep '^2\.' ceph-pg-dump.txt  | awk '{print$14}' | tr -d '[]' | awk
> -F, '{print$1"\n"$2}' | sort | uniq -c | sort -n | awk '{print$2"
> "$1}' | sort -n
>
> That showed that the OSD with the fewest PG's has 85 (and indeed that
> is the lowest space-utilized OSD), and the OSD with the most PG's has
> 118.  That's not the OSD with the highest utiization, but the OSD with
> the highest utilization does check in with 117 PG's.
>
> So it does seem more of an issue of allocating placement groups
> unevenly between OSDs than it does of unevenly sized placement groups.

Okay, but 118/85 = 1.38. You say you're seeing variance from 53%
utilization to 96%, and 53%*1.38 = 73.5%, which is *way* off your
numbers.
Now 2.91GB/2.27GB is 1.28, which isn't actually that much less
variance, but either you've gotten very unlucky with where the light
and heavy PGs are or something else is going on. I don't have enough
statistics intuition to make a guess as to which. (You could actually
add up all the PG sizes on each node and see if they match or not, if
you were really ambitious. But it might just be faster to look for
anomalies within the size of important bits on the OSD — leveldb
stores, etc that don't correspond to the PG count).
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux