Re: PG distribution scattered

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Is this safe to enable on a running cluster?

--
Warren

On Sep 19, 2013, at 9:43 AM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote:

> On 09/19/2013 08:36 AM, Niklas Goerke wrote:
>> Hi there
>> 
>> I'm currently evaluating ceph and started filling my cluster for the
>> first time. After filling it up to about 75%, it reported some OSDs
>> being "near-full".
>> After some evaluation I found that the PGs are not distributed evenly
>> over all the osds.
>> 
>> My Setup:
>> * Two Hosts with 45 Disks each --> 90 OSDs
>> * Only one newly created pool with 4500 PGs and a Replica Size of 2 -->
>> should be about 100 PGs per OSD
>> 
>> What I found was that one OSD only had 72 PGs, while another had 123 PGs
>> [1]. That means that - if I did the math correctly - I can only fill the
>> cluster to about 81%, because thats when the first OSD is completely
>> full[2].
> 
> Does distribution improve if you make a pool with significantly more PGs?
> 
>> 
>> I did some experimenting and found, that if I add another pool with 4500
>> PGs, each OSD will have exacly doubled the amount of PGs as with one
>> pool. So this is not an accident (tried it multiple times). On another
>> test-cluster with 4 Hosts and 15 Disks each, the Distribution was
>> similarly worse.
> 
> This is a bug that causes each pool to more or less be distributed the same way on the same hosts.  We have a fix, but it impacts backwards compatibility so it's off by default.  If you set:
> 
> osd pool default flag hashpspool = true
> 
> Theoretically that will cause different pools to be distributed more randomly.
> 
>> 
>> To me it looks like the rjenkins algorithm is not working as it - in my
>> opinion - should be.
>> 
>> Am I doing anything wrong?
>> Is this behaviour to be expected?
>> Can I don something about it?
>> 
>> 
>> Thank you very much in advance
>> Niklas
>> 
>> 
>> [1] I built a small script that will parse pgdump and output the amount
>> of pgs on each osd: http://pastebin.com/5ZVqhy5M
>> [2] I know I should not fill my cluster completely but I'm talking about
>> theory and adding a margin only makes it worse.
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux