PG calculator improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi,

I wanted to share a bad experience we had due to how the PG calculator works.

When we set our production cluster months ago, we had to decide on the number of PGs to give to each pool in the cluster. As you know, the PG calc would recommended to give a lot of PGs to heavy pools in size, regardless the number of objects in the pools. How bad...

We essentially had 3 pools to set on 144 OSDs :

1. a EC5+4 pool for the radosGW (.rgw.buckets) that would hold 80% of all datas in the cluster. PG calc recommended 2048 PGs. 2. a EC5+4 pool for zimbra's data (emails) that would hold 20% of all datas. PG calc recommended 512 PGs. 3. a replicated pool for zimbra's metadata (null size objects holding xattrs - used for deduplication) that would hold 0% of all datas. PG calc recommended 128 PGs, but we decided on 256.

With 120M of objects in pool #3, as soon as we upgraded to Jewel, we hit the Jewel scrubbing bug (OSDs flapping). Before we could upgrade to patched Jewel, scrub all the cluster again prior to increasing the number of PGs on this pool, we had to take more than a hundred of snapshots (for backup/restoration purposes), with the number of objects still increasing in the pool. Then when a snapshot was removed, we hit the current Jewel snap trimming bug affecting pools with too many objects for the number of PGs. The only way we could stop the trimming was to stop OSDs resulting in PGs being degraded and not trimming anymore (snap trimming only happens on active+clean PGs).

We're now just getting out of this hole, thanks to Nick's post regarding osd_snap_trim_sleep and RHCS support expertise.

If the PG calc had considered not only the pools weight but also the number of expected objects in the pool (which we knew by that time), we wouldn't have it these 2 bugs.
We hope this will help improving the ceph.com and RHCS PG calculators.

Regards,

Frédéric.

--

Frédéric Nass

Sous-direction Infrastructures
Direction du Numérique
Université de Lorraine

Tél : +33 3 72 74 11 35

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux