Re: Different disk usage on different OSDs

Christian Balzer <chibi@xxxxxxx> · Mon, 5 Jan 2015 22:28:04 +0900

On Mon, 05 Jan 2015 13:53:56 +0100 Wido den Hollander wrote:

> On 01/05/2015 01:39 PM, ivan babrou wrote:
> > On 5 January 2015 at 14:20, Christian Balzer <chibi@xxxxxxx> wrote:
> > 
> >> On Mon, 5 Jan 2015 14:04:28 +0400 ivan babrou wrote:
> >>
> >>> Hi!
> >>>
> >>> I have a cluster with 106 osds and disk usage is varying from 166gb
> >>> to 316gb. Disk usage is highly correlated to number of pg per osd (no
> >>> surprise here). Is there a reason for ceph to allocate more pg on
> >>> some nodes?
> >>>
> >> In essence what Wido said, you're a bit low on PGs.
> >>
> >> Also given your current utilization, pool 14 is totally oversize with
> >> 1024 PGs. You might want to re-create it with a smaller size and
> >> double pool 0 to 512 PGs and 10 to 4096.
> >> I assume you did raise the PGPs as well when changing the PGs, right?
> >>
> > 
> > Yep, pg = pgp for all pools. Pool 14 is just for testing purposes, it
> > might get large eventually.
> > 
> > I followed you advice in doubling pools 0 and 10. It is rebalancing at
> > 30% degraded now, but so far big osds become bigger and small become
> > smaller: http://i.imgur.com/hJcX9Us.png. I hope that trend would
> > change before rebalancing is complete.
> > 
If this should persist you might be forced to manually reweight things,
which is of course a major pain...

> > 
> >> And yeah, CEPH isn't particular good at balancing stuff by itself, but
> >> with sufficient PGs you ought to get the variance below/around 30%.
> >>
> > 
> > Is this going to change in the future releases?
> > 
That's a good question for the developers...

> 
> Some things might change, but keep in mind that balancing happens based
> on object names and not sizes. Sizes would be impossible since those are
> dynamic.
> 

I wonder if all those RGW objects are very similar named and follow a
pattern causing this imbalance.

Again, a word from the Ceph developers might clear this up.

Christian

> Wido
> 
> > 
> >> Christian
> >>
> >>> The biggest osds are 30, 42 and 69 (300gb+ each) and the smallest
> >>> are 87, 33 and 55 (170gb each). The biggest pool has 2048 pgs, pools
> >>> with very little data has only 8 pgs. PG size in biggest pool is
> >>> ~6gb (5.1..6.3 actually).
> >>>
> >>> Lack of balanced disk usage prevents me from using all the disk
> >>> space. When the biggest osd is full, cluster does not accept writes
> >>> anymore.
> >>>
> >>> Here's gist with info about my cluster:
> >>> https://gist.github.com/bobrik/fb8ad1d7c38de0ff35ae
> >>>
> >>
> >>
> >> --
> >> Christian Balzer        Network/Systems Engineer
> >> chibi@xxxxxxx           Global OnLine Japan/Fusion Communications
> >> http://www.gol.com/
> >>
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com