Erroneous stats output (ceph df) after increasing PG number

kostikas@xxxxxxxx (Konstantinos Tompoulidis) · Mon, 4 Aug 2014 20:00:48 +0000 (UTC)

Sage Weil <sweil at ...> writes:

> 
> On Mon, 4 Aug 2014, Konstantinos Tompoulidis wrote:
> > Hi all,
> > 
> > We recently added many OSDs to our production cluster.
> > This brought us to a point where the number of PGs we had assigned to our 
> > main (heavily used) pool was well below the recommended value.
> > 
> > We increased the PG number (incrementally to avoid huge degradation ratios) 
> > to the recommended optimal value.
> > 
> > Once the procedure ended we noticed that the output of "ceph df" ( POOLS: ) 
> > does not represent the actual state.
> 
> How did it mismatch reality?

At the moment the output/displayed_size of the main pool is 2.3 times the actual
size.

> 
> > Has anyone noticed this before and if so is there a fix?
> 
> There is some ambiguity in the stats after PG split that gets cleaned up 
> on the next scrub.  I wouldn't expect it to be noticeable, though ...
> 
> sage
> 

Unfortunately it is quite noticeable.
...
Our setup* serves a high IO and low latency cloud infrastructure (the disks of
the VMs are block devices on the hypervisor. The block devices are exposed
to the OS by an open source in-house implementation which relies on
librados). Due to the added load that scrub and deep scrub impose to the
cluster, we decided to disable these operations. Re-enabling them has a huge
negative impact on the performance of the infrastructure.

Is scrubbing absolutely necessary?
If yes, is there a way to mitigate it's impact on the performance?