PG calculator improvement

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Wed, 12 Apr 2017 15:35:06 +0200

Hi,

I wanted to share a bad experience we had due to how the PG calculator 
works.

When we set our production cluster months ago, we had to decide on the 
number of PGs to give to each pool in the cluster.
As you know, the PG calc would recommended to give a lot of PGs to heavy 
pools in size, regardless the number of objects in the pools. How bad...

We essentially had 3 pools to set on 144 OSDs :

1. a EC5+4 pool for the radosGW (.rgw.buckets) that would hold 80% of 
all datas in the cluster. PG calc recommended 2048 PGs.
2. a EC5+4 pool for zimbra's data (emails) that would hold 20% of all 
datas. PG calc recommended 512 PGs.
3. a replicated pool for zimbra's metadata (null size objects holding 
xattrs - used for deduplication) that would hold 0% of all datas. PG 
calc recommended 128 PGs, but we decided on 256.

With 120M of objects in pool #3, as soon as we upgraded to Jewel, we hit 
the Jewel scrubbing bug (OSDs flapping).
Before we could upgrade to patched Jewel, scrub all the cluster again 
prior to increasing the number of PGs on this pool, we had to take more 
than a hundred of snapshots (for backup/restoration purposes), with the 
number of objects still increasing in the pool. Then when a snapshot was 
removed, we hit the current Jewel snap trimming bug affecting pools with 
too many objects for the number of PGs. The only way we could stop the 
trimming was to stop OSDs resulting in PGs being degraded and not 
trimming anymore (snap trimming only happens on active+clean PGs).

We're now just getting out of this hole, thanks to Nick's post regarding 
osd_snap_trim_sleep and RHCS support expertise.

If the PG calc had considered not only the pools weight but also the 
number of expected objects in the pool (which we knew by that time), we 
wouldn't have it these 2 bugs.
We hope this will help improving the ceph.com and RHCS PG calculators.

Regards,

Frédéric.

--

Frédéric Nass

Sous-direction Infrastructures
Direction du Numérique
Université de Lorraine

Tél : +33 3 72 74 11 35

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html