Re: RADOS + deep scrubbing performance issues in production environment

Mike Dawson <mike.dawson@xxxxxxxxxxxx> · Tue, 28 Jan 2014 01:30:46 -0500

On 1/27/2014 1:45 PM, Sage Weil wrote:
There is also

  ceph osd set noscrub

and then later

  ceph osd unset noscrub

In my experience scrub isn't nearly as much of a problem as deep-scrub. 
On a IOPS constrained cluster with writes approaching the available 
aggregate spindle performance minus replication penalty and possibly 
co-located osd journal penalty, scrub may run without any disruption. 
But deep-scrub tends to make iowait on the spindles get ugly.

To disable/enable deep-scrub use:

ceph osd set nodeep-scrub
ceph osd unset nodeep-scrub

I forget whether this pauses an in-progress PG scrub or just makes it stop
when it gets to the next PG boundary.

sage

On Mon, 27 Jan 2014, Kyle Bader wrote:

Are there any tools we are not aware of for controlling, possibly pausing,
deep-scrub and/or getting some progress about the procedure ?
Also since I believe it would be a bad practice to disable deep-scrubbing do you
have any recommendations of how to work around (or even solve) this issue ?

The periodicity of scrubs is controllable with these tunables:

osd scrub max interval
osd deep scrub interval

You may also be interested in adjusting:

osd scrub load threshold

More information on the docs page:

http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing

I rarely run into a situation where 1m average of load is <0.5 on a 
multi-core server running osds, so deep scrub for me is always triggered 
by the 'osd scrub max interval'. I've had a bug out there to take core 
count into consideration:

http://tracker.ceph.com/issues/6296

The documentation used to say the "Default is 50%" implying that this 
feature should allow scrub to start with a much higher load than 0.5 
will allow on multi-core systems. The documentation has changed, but the 
default of 0.5 is still artificially suppressing deep-scrub from 
opportunistically starting on relatively idle multi-core systems.

That being said, deep-scrub may be better served with an 
osd_scrub_iops_threshold mechanism instead of (or in addition to) the 
osd_scrub_load_threshold.

- Mike

Hope that helps some!

--

Kyle
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html