Re: Expanding ceph cluster by adding more OSDs

Guang <yguang11@xxxxxxxxx> · Sat, 2 Nov 2013 23:06:43 +0800

Hi Kyle,
Thanks for you response. Though I haven't tested it, my gut feeling is the same, changing the PG number may result in re-shuffling of the data.

In terms of the strategy you mentioned to expand a cluster, I have a few questions:
  1. By adding a LITTLE more weight each time, my understanding is to reduce the load for the OSD being added, is it? If so, can we use the throttle setting to achieve the same goal?
  2. If I would like to expand the cluster every quarter with 30% capacity, by using such way, it might take a long time to add new capacity, is my understanding correct?
  3. Is there any automatic tool to do this, or I will need to closely monitor, and dump the crush rule / edit it and push back?

I am testing a scenario to add one OSD each time (I have 330 OSD in total), the weight is using default one. There are a couple of observations: 1) the recovery start quick (several hundred MB/s) and then get slower to around 10MB/s. 2) It impact the online traffic quite a lot (from my observation, mainly of the recovering PGs).

I tried to search some best practice to expand a cluster with bad luck, anybody would like to share your experience? Thanks very much.

Thanks,
Guang

Date: Thu, 10 Oct 2013 05:15:27 -0700
From: Kyle Bader <kyle.bader@xxxxxxxxx>
To: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>
Subject: Re:  Expanding ceph cluster by adding more OSDs
Message-ID:
	<CAFMfnwq+HBGsezMe3vwoM_gqCWiKd1393rxc+xB0xgT4nXqttg@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

I've contracted and expanded clusters by up to a rack of 216 OSDs - 18
nodes, 12 drives each.  New disks are configured with a CRUSH weight of 0
and I slowly add weight (0.1 to 0.01 increments), wait for the cluster to
become active+clean and then add more weight. I was expanding after
contraction so my PG count didn't need to be corrected, I tend to be
liberal and opt for more PGs.  If I hadn't contracted the cluster prior to
expanding it I would probably add PGs after all the new OSDs have finished
being weighted into the cluster.

On Wed, Oct 9, 2013 at 8:55 PM, Michael Lowe <j.michael.lowe@xxxxxxxxx>wrote:

I had those same questions, I think the answer I got was that it was
better to have too few pg's than to have overloaded osd's.  So add osd's
then add pg's.  I don't know the best increments to grow in, probably
depends largely on the hardware in your osd's.

Sent from my iPad

On Oct 9, 2013, at 11:34 PM, Guang <yguang11@xxxxxxxxx> wrote:

Thanks Mike. I get your point.

There are still a few things confusing me:
1) We expand Ceph cluster by adding more OSDs, which will trigger
re-balance PGs across the old & new OSDs, and likely it will break the
optimized PG numbers for the cluster.
 2) We can add more PGs which will trigger re-balance objects across
old & new PGs.

So:
1) What is the recommended way to expand the cluster by adding OSDs
(and potentially adding PGs), should we do them at the same time?
2) What is the recommended way to scale a cluster from like 1PB to 2PB,
should we scale it to like 1.1PB to 1.2PB or move to 2PB directly?

Thanks,
Guang

On Oct 10, 2013, at 11:10 AM, Michael Lowe wrote:

There used to be, can't find it right now.  Something like 'ceph osd
set pg_num <num>' then 'ceph osd set pgp_num <num>' to actually move your
data into the new pg's.  I successfully did it several months ago, when
bobtail was current.

Sent from my iPad

On Oct 9, 2013, at 10:30 PM, Guang <yguang11@xxxxxxxxx> wrote:

Thanks Mike.

Is there any documentation for that?

Thanks,
Guang

On Oct 9, 2013, at 9:58 PM, Mike Lowe wrote:

You can add PGs,  the process is called splitting.  I don't think PG
merging, the reduction in the number of PGs, is ready yet.

On Oct 8, 2013, at 11:58 PM, Guang <yguang11@xxxxxxxxx> wrote:

Hi ceph-users,
Ceph recommends the PGs number of a pool is (100 * OSDs) / Replicas,
per my understanding, the number of PGs for a pool should be fixed even we
scale out / in the cluster by adding / removing OSDs, does that mean if we
double the OSD numbers, the PG number for a pool is not optimal any more
and there is no chance to correct it?

Thanks,
Guang
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com