Erroneous stats output (ceph df) after increasing PG number

kostikas@xxxxxxxx (Konstantinos Tompoulidis) · Tue, 5 Aug 2014 14:25:27 +0000 (UTC)

Konstantinos Tompoulidis <kostikas at ...> writes:

> 
> Sage Weil <sweil <at> ...> writes:
> 
> > 
> > On Mon, 4 Aug 2014, Konstantinos Tompoulidis wrote:
> > > Hi all,
> > > 
> > > We recently added many OSDs to our production cluster.
> > > This brought us to a point where the number of PGs we had assigned to 
our 
> > > main (heavily used) pool was well below the recommended value.
> > > 
> > > We increased the PG number (incrementally to avoid huge degradation 
ratios) 
> > > to the recommended optimal value.
> > > 
> > > Once the procedure ended we noticed that the output of "ceph df" ( 
POOLS: ) 
> > > does not represent the actual state.
> > 
> > How did it mismatch reality?
> 
> At the moment the output/displayed_size of the main pool is 2.3 times the 
actual
> size.
> 
> > 
> > > Has anyone noticed this before and if so is there a fix?
> > 
> > There is some ambiguity in the stats after PG split that gets cleaned up 
> > on the next scrub.  I wouldn't expect it to be noticeable, though ...
> > 
> > sage
> > 
> 
> Unfortunately it is quite noticeable.
> ...
> Our setup* serves a high IO and low latency cloud infrastructure (the 
disks of
> the VMs are block devices on the hypervisor. The block devices are exposed
> to the OS by an open source in-house implementation which relies on
> librados). Due to the added load that scrub and deep scrub impose to the
> cluster, we decided to disable these operations. Re-enabling them has a 
huge
> negative impact on the performance of the infrastructure.
> 
> Is scrubbing absolutely necessary?
> If yes, is there a way to mitigate it's impact on the performance?
> 

Hi all,

We decided to perform a scrub and see the impact now that we have 4x PGs.
It seems that now that the PGs are "smaller", the impact is not that high. 
We kept osd-max-scrubs to 1 which is the default setting.
Indeed the output of "ceph df" was fixed.

Another issue that came up is that, during the scrub process, we do not get 
any feedback regarding the r/w client IO and op/s. It makes some sense since 
it must be difficult to distinguish "scrub related" io/traffic.

Is there a workaround for this? Does Calamari experience the same issue 
during "scrub"?

Thanks in advance!
Konstantinos