Re: Increasing PG number

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Last summer we increased an EC 8+3 pool from 1024 to 2048 PGs on our ~1500 OSD (Kraken) cluster. This pool contained ~2 petabytes of data at the time.

 

We did a fair amount of testing on a throwaway pool on the same cluster beforehand, starting with small increases (16/32/64).

 

The main observation was that the act of splitting the PGs causes issues, not the resulting data movement, assuming your backfills are tuned to a level where they don’t affect client IO.

 

As the PG splitting and peering (pg_num and pgp_num) increases are a) non reversible and b) the resulting operations happen instantaneously, overly large increases can end up with an unhappy mess of excessive storage node load, OSDs flapping and blocked requests.

 

We ended up doing increases of 128 PGs at a time.

 

I’d hazard a guess that you will be fine going straight to 512 PGs, but the only way to be sure of the correct increase size for your cluster is to test it.

 

Cheers

Tom

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Karun Josy
Sent: 02 January 2018 16:23
To: Hans van den Bogert <hansbogert@xxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject: Re: [ceph-users] Increasing PG number

 

https://access.redhat.com/solutions/2457321

It says it is a very intensive process and can affect cluster performance.

 

Our Version is Luminous 12.2.2

And we are using erasure coding profile for a pool 'ecpool' with k=5 and m=3

Current PG number is 256 and it has about 20 TB of data.


Should I increase it gradually? Or set pg as 512 in one step ?

 

 

 


Karun Josy

 

On Tue, Jan 2, 2018 at 9:26 PM, Hans van den Bogert <hansbogert@xxxxxxxxx> wrote:

Please refer to standard documentation as much as possible, 

 

 

Han’s is also incomplete, since you also need to change the ‘pgp_num’ as well.

 

Regards,

 

Hans

 

On Jan 2, 2018, at 4:41 PM, Vladimir Prokofev <v@xxxxxxxxxxx> wrote:

 

Increased number of PGs in multiple pools in a production cluster on 12.2.2 recently - zero issues.

CEPH claims that increasing pg_num and pgp_num are safe operations, which are essential for it's ability to scale, and this sounds pretty reasonable to me. [1]

 

 

 

2018-01-02 18:21 GMT+03:00 Karun Josy <karunjosy1@xxxxxxxxx>:

Hi,

 

 Initial PG count was not properly planned while setting up the cluster, so now there are only less than 50 PGs per OSDs.

 

What are the best practises to increase PG number of a pool ?

We have replicated pools as well as EC pools.

 

Or is it better to create a new pool with higher PG numbers?

 

 

Karun 

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux