Re: Very unbalanced storage

Sage Weil <sage@xxxxxxxxxxx> · Fri, 31 Aug 2012 20:05:34 -0700 (PDT)

On Sat, 1 Sep 2012, Xiaopong Tran wrote:
> On 09/01/2012 12:05 AM, Sage Weil wrote:
> > On Fri, 31 Aug 2012, Xiaopong Tran wrote:
> > > Hi,
> > > 
> > > Ceph storage on each disk in the cluster is very unbalanced. On each
> > > node, the data seems to go to one or two disks, while other disks
> > > are almost empty.
> > > 
> > > I can't find anything wrong from the crush map, it's just the
> > > default for now. Attached is the crush map.
> > 
> > This is usually a problem with the pg_num for the pool you are using.  Can
> > you include the output from 'ceph osd dump | grep ^pool'?  By default,
> > pools get 8 pgs, which will distribute poorly.
> > 
> > sage
> > 
> > 
> Here is the pool I'm interested in:
> 
> pool 9 'yunio2' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 8
> pgp_num 8 last_change 216 owner 0
> 
> So, ok, by default, the pg_num is really small. That's a very dumb
> mistake I made. Is there any easy way to change this?

I think me choosing 8 as the default was the dumb thing :)

> I looked at the tunables, if I upgrade to v0.48.1 or v0.49,
> then would I be able to tune the pg_num value?

Sadly you can't yet adjust pg_num for an active pool.  You can create a 
new pool,

	ceph osd pool create <name> <pg_num>

I would aim for 20 * num_osd, or thereabouts.. see 

	http://ceph.com/docs/master/ops/manage/grow/placement-groups/

Then you can copy the data from the old pool to the new one with

	rados cppool yunio2 yunio3

This won't be particularly fast, but it will work.  You can also do

	ceph osd pool rename <oldname> <newname>
	ceph osd pool delete <name>

I hope this solves your problem!
sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html