Is this safe to enable on a running cluster? -- Warren On Sep 19, 2013, at 9:43 AM, Mark Nelson <mark.nelson@xxxxxxxxxxx> wrote: > On 09/19/2013 08:36 AM, Niklas Goerke wrote: >> Hi there >> >> I'm currently evaluating ceph and started filling my cluster for the >> first time. After filling it up to about 75%, it reported some OSDs >> being "near-full". >> After some evaluation I found that the PGs are not distributed evenly >> over all the osds. >> >> My Setup: >> * Two Hosts with 45 Disks each --> 90 OSDs >> * Only one newly created pool with 4500 PGs and a Replica Size of 2 --> >> should be about 100 PGs per OSD >> >> What I found was that one OSD only had 72 PGs, while another had 123 PGs >> [1]. That means that - if I did the math correctly - I can only fill the >> cluster to about 81%, because thats when the first OSD is completely >> full[2]. > > Does distribution improve if you make a pool with significantly more PGs? > >> >> I did some experimenting and found, that if I add another pool with 4500 >> PGs, each OSD will have exacly doubled the amount of PGs as with one >> pool. So this is not an accident (tried it multiple times). On another >> test-cluster with 4 Hosts and 15 Disks each, the Distribution was >> similarly worse. > > This is a bug that causes each pool to more or less be distributed the same way on the same hosts. We have a fix, but it impacts backwards compatibility so it's off by default. If you set: > > osd pool default flag hashpspool = true > > Theoretically that will cause different pools to be distributed more randomly. > >> >> To me it looks like the rjenkins algorithm is not working as it - in my >> opinion - should be. >> >> Am I doing anything wrong? >> Is this behaviour to be expected? >> Can I don something about it? >> >> >> Thank you very much in advance >> Niklas >> >> >> [1] I built a small script that will parse pgdump and output the amount >> of pgs on each osd: http://pastebin.com/5ZVqhy5M >> [2] I know I should not fill my cluster completely but I'm talking about >> theory and adding a margin only makes it worse. >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com