Re: PG Sizing Question

"Anthony D'Atri" <aad@xxxxxxxxxxxxxx> · Wed, 1 Mar 2023 00:58:06 -0500

This can be subtle and is easy to mix up.

The “PG ratio” is intended to be the number of PGs hosted on each OSD, plus or minus a few.

Note how I phrased that, it’s not the number of PGs divided by the number of OSDs.  Remember that PGs are replicated.

While each PG belongs to exactly one pool, for purposes of estimating pg_num, we calculate the desired aggregate number of PGs on this ratio, then divide that up among pools, ideally split into powers of 2 per pool, relative to the amount of data in the pool.

You can run `ceph osd df` and see the number of PGs on each OSD.  There will be some variance, but consider the average.

This venerable calculator:

https://old.ceph.com/pgcalc/;
PGCalc
old.ceph.com

can help get a feel for how this works.

100 is the official party line, it used to be 200.  More PGs means more memory use; too few has various other drawbacks.

PGs can in part be thought of as parallelism domains; more PGs means more parallelism.  So on HDDs, a ratio in the 100-200 range is IMHO reasonable.  SAS/SATA OSDs 200-300, NVMe OSDs perhaps higher, though perhaps not if each device hosts more than one OSD (which should only ever be done on NVMe devices).

Your numbers below are probably ok for HDDs, you might bump the pool with the most data up to the next power of 2 if these are SSDs.

The pgcalc above includes parameters for what fraction of the cluster’s data each pool contains.  A pool with 5% of the data needs fewer PGs than a pool with 50% of the cluster’s data.

Others may well have different perspectives, this is something where opinions vary.  The pg_autoscaler in bulk mode can automate this, if one is prescient with feeding it parameters.

> On Feb 28, 2023, at 9:23 PM, Deep Dish <deeepdish@xxxxxxxxx> wrote:
> 
> Hello
> 
> 
> 
> Looking to get some official guidance on PG and PGP sizing.
> 
> 
> 
> Is the goal to maintain approximately 100 PGs per OSD per pool or for the
> cluster general?
> 
> 
> 
> Assume the following scenario:
> 
> 
> 
> Cluster with 80 OSD across 8 nodes;
> 
> 3 Pools:
> 
> -       Pool1 = Replicated 3x
> 
> -       Pool2 = Replicated 3x
> 
> -       Pool3 = Erasure Coded 6-4
> 
> 
> 
> 
> 
> Assuming the well published formula:
> 
> 
> 
> Let (Target PGs / OSD) = 100
> 
> 
> 
> [ (Target PGs / OSD) * (# of OSDs) ] / (Replica Size)
> 
> 
> 
> -       Pool1 = (100*80)/3 = 2666.67 => 4096
> 
> -       Pool2 = (100*80)/3 = 2666.67 => 4096
> 
> -       Pool3 = (100*80)/10 = 800 => 1024
> 
> 
> 
> Total cluster would have 9216 PGs and PGPs.
> 
> 
> Are there any implications (performance / monitor / MDS / RGW sizing) with
> how many PGs are created on the cluster?
> 
> 
> 
> Looking for validation and / or clarification of the above.
> 
> 
> 
> Thank you.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx