On Mon, 05 Jan 2015 13:53:56 +0100 Wido den Hollander wrote: > On 01/05/2015 01:39 PM, ivan babrou wrote: > > On 5 January 2015 at 14:20, Christian Balzer <chibi@xxxxxxx> wrote: > > > >> On Mon, 5 Jan 2015 14:04:28 +0400 ivan babrou wrote: > >> > >>> Hi! > >>> > >>> I have a cluster with 106 osds and disk usage is varying from 166gb > >>> to 316gb. Disk usage is highly correlated to number of pg per osd (no > >>> surprise here). Is there a reason for ceph to allocate more pg on > >>> some nodes? > >>> > >> In essence what Wido said, you're a bit low on PGs. > >> > >> Also given your current utilization, pool 14 is totally oversize with > >> 1024 PGs. You might want to re-create it with a smaller size and > >> double pool 0 to 512 PGs and 10 to 4096. > >> I assume you did raise the PGPs as well when changing the PGs, right? > >> > > > > Yep, pg = pgp for all pools. Pool 14 is just for testing purposes, it > > might get large eventually. > > > > I followed you advice in doubling pools 0 and 10. It is rebalancing at > > 30% degraded now, but so far big osds become bigger and small become > > smaller: http://i.imgur.com/hJcX9Us.png. I hope that trend would > > change before rebalancing is complete. > > If this should persist you might be forced to manually reweight things, which is of course a major pain... > > > >> And yeah, CEPH isn't particular good at balancing stuff by itself, but > >> with sufficient PGs you ought to get the variance below/around 30%. > >> > > > > Is this going to change in the future releases? > > That's a good question for the developers... > > Some things might change, but keep in mind that balancing happens based > on object names and not sizes. Sizes would be impossible since those are > dynamic. > I wonder if all those RGW objects are very similar named and follow a pattern causing this imbalance. Again, a word from the Ceph developers might clear this up. Christian > Wido > > > > >> Christian > >> > >>> The biggest osds are 30, 42 and 69 (300gb+ each) and the smallest > >>> are 87, 33 and 55 (170gb each). The biggest pool has 2048 pgs, pools > >>> with very little data has only 8 pgs. PG size in biggest pool is > >>> ~6gb (5.1..6.3 actually). > >>> > >>> Lack of balanced disk usage prevents me from using all the disk > >>> space. When the biggest osd is full, cluster does not accept writes > >>> anymore. > >>> > >>> Here's gist with info about my cluster: > >>> https://gist.github.com/bobrik/fb8ad1d7c38de0ff35ae > >>> > >> > >> > >> -- > >> Christian Balzer Network/Systems Engineer > >> chibi@xxxxxxx Global OnLine Japan/Fusion Communications > >> http://www.gol.com/ > >> > > > > > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com