Re: Increasing PG number

David Turner <drakonstein@xxxxxxxxx> · Thu, 04 Jan 2018 03:31:09 +0000

That script was mine and we were creating the PGs in chunks of 256 at a time with nobackfill and norecover set until we added 4k PGs. We used the script because of the amount of peering caused by adding thousands of PGs at a time was causing problems for client io. We did that 4 times (backfilling in between) in each cluster going from 16k PGs to 32k PGs. I wouldn't be too worried about going up to 512 PGs as long as you've calculated it out and that is the proper number of PGs for you.  I would agree that increasing your PGs is a very intensive operation for the cluster, but 500 at a time shouldn't be a big problem as long as you have your settings configured such that backfilling doesn't impact your client io more than you're willing to let it.

We also added them in chunks of 4k at a time because of how long the backfilling would take. We found that amount would finish backfilling within a fast enough timeframe for our needs.

On Wed, Jan 3, 2018, 5:48 PM Christian Wuerdig <christian.wuerdig@xxxxxxxxx> wrote:
A while back there was a thread on the ML where someone posted a bash

script to slowly increase the number of PGs in steps of 256 AFAIR, the

script would monitor the cluster activity and once all data shuffling

had finished it would do another round until the target is hit.

That was on filestore though and hammer or jewel, not sure if you can

go faster on bluestore or luminous in general.

On Thu, Jan 4, 2018 at 12:04 AM,  <tom.byrne@xxxxxxxxxx> wrote:

> Last summer we increased an EC 8+3 pool from 1024 to 2048 PGs on our ~1500

> OSD (Kraken) cluster. This pool contained ~2 petabytes of data at the time.

>

>

>

> We did a fair amount of testing on a throwaway pool on the same cluster

> beforehand, starting with small increases (16/32/64).

>

>

>

> The main observation was that the act of splitting the PGs causes issues,

> not the resulting data movement, assuming your backfills are tuned to a

> level where they don’t affect client IO.

>

>

>

> As the PG splitting and peering (pg_num and pgp_num) increases are a) non

> reversible and b) the resulting operations happen instantaneously, overly

> large increases can end up with an unhappy mess of excessive storage node

> load, OSDs flapping and blocked requests.

>

>

>

> We ended up doing increases of 128 PGs at a time.

>

>

>

> I’d hazard a guess that you will be fine going straight to 512 PGs, but the

> only way to be sure of the correct increase size for your cluster is to test

> it.

>

>

>

> Cheers

>

> Tom

>

>

>

> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of

> Karun Josy

> Sent: 02 January 2018 16:23

> To: Hans van den Bogert <hansbogert@xxxxxxxxx>

> Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>

> Subject: Re:  Increasing PG number

>

>

>

> https://access.redhat.com/solutions/2457321

>

> It says it is a very intensive process and can affect cluster performance.

>

>

>

> Our Version is Luminous 12.2.2

>

> And we are using erasure coding profile for a pool 'ecpool' with k=5 and m=3

>

> Current PG number is 256 and it has about 20 TB of data.

>

>

> Should I increase it gradually? Or set pg as 512 in one step ?

>

>

>

>

>

>

>

>

> Karun Josy

>

>

>

> On Tue, Jan 2, 2018 at 9:26 PM, Hans van den Bogert <hansbogert@xxxxxxxxx>

> wrote:

>

> Please refer to standard documentation as much as possible,

>

>

>

>

> http://docs.ceph.com/docs/jewel/rados/operations/placement-groups/#set-the-number-of-placement-groups

>

>

>

> Han’s is also incomplete, since you also need to change the ‘pgp_num’ as

> well.

>

>

>

> Regards,

>

>

>

> Hans

>

>

>

> On Jan 2, 2018, at 4:41 PM, Vladimir Prokofev <v@xxxxxxxxxxx> wrote:

>

>

>

> Increased number of PGs in multiple pools in a production cluster on 12.2.2

> recently - zero issues.

>

> CEPH claims that increasing pg_num and pgp_num are safe operations, which

> are essential for it's ability to scale, and this sounds pretty reasonable

> to me. [1]

>

>

>

>

>

> [1]

> https://www.sebastien-han.fr/blog/2013/03/12/ceph-change-pg-number-on-the-fly/

>

>

>

> 2018-01-02 18:21 GMT+03:00 Karun Josy <karunjosy1@xxxxxxxxx>:

>

> Hi,

>

>

>

>  Initial PG count was not properly planned while setting up the cluster, so

> now there are only less than 50 PGs per OSDs.

>

>

>

> What are the best practises to increase PG number of a pool ?

>

> We have replicated pools as well as EC pools.

>

>

>

> Or is it better to create a new pool with higher PG numbers?

>

>

>

>

>

> Karun

>

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

>

>

>

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com