[ceph-users] Throttle pool pg_num/pgp_num increase impact

kostikas@xxxxxxxxx (Konstantinos Tompoulidis) · Thu, 31 Jul 2014 10:55:08 +0000 (UTC)

Hi all,

We have been working closely with Kostis on this and we have some results we 
thought we should share.

Increasing the PGs was mandatory for us since we have been noticing 
fragmantation* issues on many OSDs. Also, we were below the recommended 
number for our main pool for quite some time (we ~tripled the number of OSDs 
- at the moment ~140).

*fragmantation= Many OSDs had uneven usage (not all objects have the same 
size).
Increasing the PG number we got better dispersion.

Increasing the PG/PGP number simply causes a recovery operation since the 
objects need to be redistributed. The recovery operation "honors" the 
recover-max-active and max-backfills settings which was our main concern.

Since this is(was) the case, we decided to start by adding a few PGs(~20). 
The impact was trivial. At our last increase we went with increments of 512. 
This number caused a degradation of ~5%. Setting recovery/backfills=1 we get 
a full recovery in ~6hours. This is acceptable for us for now.

* We started at 1800 PGs at our main pool (this is the pool that holds the 
majority of data). The degradation ratio is of course proportional to the 
PG_increase_number and the amount of data (at the moment we have reached 
6120 PGs - our goal for now is 8192).

Regarding the issue of SSDs, we have noticed the same journal latency using 
15k SAS and standard SSDs (unfortunately on different controllers - 1 
journal per device). What we would like is to have multiple journals on the 
same disk (SSDs are expensive and the journal size needed is relatively 
small). If you have some insight on this, please let us know.