Re: PG Calculations

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Fri, 14 Mar 2014 11:18:48 -0500

My personal opinion on this (not necessarily the official Inktank 
position) is that I'd rather error on the side of too many PGs for small 
clusters while I would probably prefer to error on the side of fewer 
(though not insanely so) PGs for larger clusters.

IE I suspect that the difference between 2048 and 4096 PGs on a small 
cluster isn't going to be a huge deal, but going from 131072 to 262144 
PGs on a larger cluster may have bigger effects, especially on the mons.

There are things to consider here that go beyond just monitor workload 
and data distribution though.  A big one is how many objects you expect 
to have vs the number of PGs and what the per PG directory splitting 
thresholds are set to.  The more PGs you have, the more total objects 
you can place before directories get split (at the same split 
thresholds).  Whether or not you are better off with more PGs or higher 
split thresholds at high object counts isn't totally clear yet, 
especially when factoring in backfill/recovery.  These are things we are 
actively thinking about.

Mark

On 03/14/2014 10:18 AM, Karol Kozubal wrote:
Dan, I think your interpretation is indeed correct.

The documentation on this page looks to be saying this.
http://ceph.com/docs/master/rados/operations/placement-groups/

Increasing the number of placement groups reduces the variance in
per-OSD load across your cluster. We recommend approximately 50-100
placement groups per OSD to balance out memory and CPU requirements and
per-OSD load. For a single pool of objects, you can use the following
formula:

Then lower on the same page…

When using multiple data pools for storing objects, you need to ensure
that you balance the number of placement groups per pool with the number
of placement groups per OSD so that you arrive at a reasonable total
number of placement groups that provides reasonably low variance per OSD
without taxing system resources or making the peering process too slow.

However a confirmation from InkTank would be nice.

Karol

From: Dan Van Der Ster <daniel.vanderster@xxxxxxx
<mailto:daniel.vanderster@xxxxxxx>>
Date: Friday, March 14, 2014 at 10:55 AM
To: "Bradley.McNamara@xxxxxxxxxxx <mailto:Bradley.McNamara@xxxxxxxxxxx>"
<Bradley.McNamara@xxxxxxxxxxx <mailto:Bradley.McNamara@xxxxxxxxxxx>>
Cc: "ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>"
<ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>>
Subject: Re:  PG Calculations

Hi,
Since you didn't get an immediate reply from a developer, I'm going to
be bold and repeat my interpretation that the documentation implies,
perhaps not clearly enough, that the 50-100 PGs per OSD rule should be
applied for the total of all pools, not per pool. I hope a dev will
correct me if I'm wrong.

With your config you must have an avg 400 PGs per OSD. Do you find
peering/backfilling/recovery to be responsive? How is the CPU and memory
usage of your OSDs during backfilling?

Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --

-------- Original Message --------
From: "McNamara, Bradley" <Bradley.McNamara@xxxxxxxxxxx
<mailto:Bradley.McNamara@xxxxxxxxxxx>>
Sent: Thursday, March 13, 2014 08:03 PM
To: ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
Subject:  PG Calculations

There was a very recent thread discussing PG calculations, and it made
me doubt my cluster setup.  So, Inktank, please provide some clarification.

I followed the documentation, and interpreted that documentation to mean
that PG and PGP calculation was based upon a per-pool calculation.  The
recent discussion introduced a slightly different formula adding in the
total number of pools:

# OSD * 100 / 3

vs.

# OSD’s * 100 / (3 * # pools)

My current cluster has 24 OSD’s, replica size of 3, and the standard
three pools, RBD, DATA, and METADATA.  My current total PG’s is 3072,
which by the second formula is way too many.  So, do I have too many?
Does it need to be addressed, or can it wait until I add more OSD’s,
which will bring the ratio closer to ideal?  I’m currently using only
RBD and CephFS, no RadosGW.

Thank you!

Brad

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com