Re: PG Scaling

Karol Kozubal <Karol.Kozubal@xxxxxxxxx> · Wed, 12 Mar 2014 23:42:27 +0000

Awesome thanks for the info.

We have just began testing phase. I have 10Gig interfaces on both the cluster and public interfaces and using fast disks so I probably won’t feel much of a difference. Since this is just a test setup I have some freedom here but nice to know the consequences.

Karol

From: <McNamara>, Bradley <Bradley.McNamara@xxxxxxxxxxx>

Date: Wednesday, March 12, 2014 at 7:01 PM

To: Karol Kozubal <karol.kozubal@xxxxxxxxx>, "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>

Subject: RE: PG Scaling

Most things will cause data movement…

If you are going to have different failure zones within your crush map, I would edit your crush map and define those failure zones/buckets, first. 
 This will cause data movement when you inject the new crush map into the cluster.  This will immediately cause data movement.

Once the data movement from the new crush map is done, then I would change the number of placement groups.  This will immediately cause data movement,
 too.

If you have a cluster network defined and in use, this shouldn’t materially affect the running cluster.  Response times may be exaggerated, but the
 cluster will be completely functional.

Brad

From: Karol Kozubal [mailto:Karol.Kozubal@xxxxxxxxx]

Sent: Wednesday, March 12, 2014 1:52 PM

To: McNamara, Bradley; ceph-users@xxxxxxxxxxxxxx

Subject: Re: PG Scaling

Thank you for your response.

The number of replicas is already set to 3. So if I simply increase the number of pg’s they will also start to move or is that simply triggered with size alterations?
 I suppose since this will generate movement in the cluster network it is ideal to do this operation while the cluster isnt as busy.

Karol

From:
<McNamara>, Bradley <Bradley.McNamara@xxxxxxxxxxx>

Date: Wednesday, March 12, 2014 at 1:54 PM

To: Karol Kozubal <karol.kozubal@xxxxxxxxx>, "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>

Subject: RE: PG Scaling

Round up your pg_num and pgp_num to the next power of 2, 2048.

Ceph will start moving data as soon as you implement the new ‘size 3’, so I would increase the pg_num and pgp_num, first, then increase the size. 
 It will start creating the new PG’s immediately.  You can see all this going on using ‘ceph –w’.

Once the data is finished moving, you may need to  run ‘ceph osd crush tunables optimal’.  This should take care of any unclean PG’s that may be hanging
 around.

It is NOT possible to decrease the PG’s.  One would need to  delete the pool and recreate it.

Brad

From:ceph-users-bounces@xxxxxxxxxxxxxx
 [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Karol Kozubal

Sent: Wednesday, March 12, 2014 9:08 AM

To: ceph-users@xxxxxxxxxxxxxx

Subject: Re:  PG Scaling

Correction: Sorry min_size is at 1 everywhere.

Thank you.

Karol Kozubal

From:
Karol Kozubal <karol.kozubal@xxxxxxxxx>

Date: Wednesday, March 12, 2014 at 12:06 PM

To: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>

Subject: PG Scaling

Hi Everyone,

I am deploying an openstack deployment with Fuel 4.1 and have a 20 node ceph deployment of c6220’s with 3 osd’s and 1 journaling disk per node. When first
 deployed each storage pool is configured with the correct size and min_size attributes however fuel doesn’t seem to apply the correct number of pg’s to the pools based on the number of osd’s that we actually have.

I make the adjustments using the following

(20 nodes * 3 OSDs)*100 / 3 replicas = 2000

ceph osd pool volumes set size 3

ceph osd pool volumes set min_size 3

ceph osd pool volumes set pg_num 2000

ceph osd pool volumes set pgp_num 2000

ceph osd pool images set size 3

ceph osd pool images set min_size 3

ceph osd pool images set pg_num 2000

ceph osd pool images set pgp_num 2000

ceph osd pool compute set size 3

ceph osd pool compute set min_size 3

ceph osd pool compute set pg_num 2000

ceph osd pool compute set pgp_num 2000

Here are the questions I am left with concerning these changes:

How long does it take for ceph to apply the changes and recalculate the pg’s?

When is it safe to do this type of operation? before any data is written to the pools or is doing this while pools are used acceptable?

Is it possible to scale down the number of pg’s ?

Thank you for your input.

Karol Kozubal

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com