Hi all, We have been working closely with Kostis on this and we have some results we thought we should share. Increasing the PGs was mandatory for us since we have been noticing fragmantation* issues on many OSDs. Also, we were below the recommended number for our main pool for quite some time (we ~tripled the number of OSDs - at the moment ~140). *fragmantation= Many OSDs had uneven usage (not all objects have the same size). Increasing the PG number we got better dispersion. Increasing the PG/PGP number simply causes a recovery operation since the objects need to be redistributed. The recovery operation "honors" the recover-max-active and max-backfills settings which was our main concern. Since this is(was) the case, we decided to start by adding a few PGs(~20). The impact was trivial. At our last increase we went with increments of 512. This number caused a degradation of ~5%. Setting recovery/backfills=1 we get a full recovery in ~6hours. This is acceptable for us for now. * We started at 1800 PGs at our main pool (this is the pool that holds the majority of data). The degradation ratio is of course proportional to the PG_increase_number and the amount of data (at the moment we have reached 6120 PGs - our goal for now is 8192). Regarding the issue of SSDs, we have noticed the same journal latency using 15k SAS and standard SSDs (unfortunately on different controllers - 1 journal per device). What we would like is to have multiple journals on the same disk (SSDs are expensive and the journal size needed is relatively small). If you have some insight on this, please let us know.