Re: Different disk usage on different OSDs

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Mon, 5 Jan 2015 17:14:35 -0700

Ceph currently isn't very smart on ordering the balancing operations. It can fill a disk before moving some things off of it. So if you are close to the toofull line, it can push that OSD over. I think there is a blueprint to help with this being worked on for Hammer.
You have a couple of options. You can try to bump up the full limit and see if it unblocks it and moves off the PGS that are stuck, then drop it back down. Don't go above 98% though. You could also try reducing the size of one ore more pools and then after the cluster settles, increase it to the original size. You could also try deleting some of the PGs manually in the OSD file system to get it under the full line (I'm not exactly on the steps on this, but they have been discussed on the mailing list in the last couple of months).

Good Luck!

On Mon, Jan 5, 2015 at 12:41 PM, ivan babrou <ibobrik@xxxxxxxxx> wrote:
Rebalancing is almost finished, but things got even worse: http://i.imgur.com/0HOPZil.png
Moreover, one pg is in active+remapped+wait_backfill+backfill_toofull state:

2015-01-05 19:39:31.995665 mon.0 [INF] pgmap v3979616: 5832 pgs: 23 active+remapped+wait_backfill, 1 active+remapped+wait_backfill+backfill_toofull, 2 active+remapped+backfilling, 5805 active+clean, 1 active+remapped+backfill_toofull; 11210 GB data, 26174 GB used, 18360 GB / 46906 GB avail; 65246/10590590 objects degraded (0.616%)

So at 55.8% disk space utilization ceph is full. That doesn't look very well.

On 5 January 2015 at 15:39, ivan babrou <ibobrik@xxxxxxxxx> wrote:

On 5 January 2015 at 14:20, Christian Balzer <chibi@xxxxxxx> wrote:
On Mon, 5 Jan 2015 14:04:28 +0400 ivan babrou wrote:

> Hi!

>

> I have a cluster with 106 osds and disk usage is varying from 166gb to

> 316gb. Disk usage is highly correlated to number of pg per osd (no

> surprise here). Is there a reason for ceph to allocate more pg on some

> nodes?

>

In essence what Wido said, you're a bit low on PGs.

Also given your current utilization, pool 14 is totally oversize with 1024

PGs. You might want to re-create it with a smaller size and double pool 0

to 512 PGs and 10 to 4096.

I assume you did raise the PGPs as well when changing the PGs, right?

Yep, pg = pgp for all pools. Pool 14 is just for testing purposes, it might get large eventually.

I followed you advice in doubling pools 0 and 10. It is rebalancing at 30% degraded now, but so far big osds become bigger and small become smaller: http://i.imgur.com/hJcX9Us.png. I hope that trend would change before rebalancing is complete.

And yeah, CEPH isn't particular good at balancing stuff by itself, but

with sufficient PGs you ought to get the variance below/around 30%.

Is this going to change in the future releases?

Christian

> The biggest osds are 30, 42 and 69 (300gb+ each) and the smallest are 87,

> 33 and 55 (170gb each). The biggest pool has 2048 pgs, pools with very

> little data has only 8 pgs. PG size in biggest pool is ~6gb (5.1..6.3

> actually).

>

> Lack of balanced disk usage prevents me from using all the disk space.

> When the biggest osd is full, cluster does not accept writes anymore.

>

> Here's gist with info about my cluster:

> https://gist.github.com/bobrik/fb8ad1d7c38de0ff35ae

>

--

Christian Balzer        Network/Systems Engineer

chibi@xxxxxxx           Global OnLine Japan/Fusion Communications

http://www.gol.com/

-- 
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

-- 
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com