Re: [ceph-users] Flapping osd / continuously reported as failed

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 25 Jul 2013 10:32:23 -0700

On Thu, Jul 25, 2013 at 12:47 AM, Mostowiec Dominik
<Dominik.Mostowiec@xxxxxxxxxxxx> wrote:
> Hi
> We found something else.
> After osd.72 flapp, one PG '3.54d' was recovering long time.
>
> --
> ceph health details
> HEALTH_WARN 1 pgs recovering; recovery 1/39821745 degraded (0.000%)
> pg 3.54d is active+recovering, acting [72,108,23]
> recovery 1/39821745 degraded (0.000%)
> --
>
> Last flap down/up osd.72 was 00:45.
> In logs we found:
> 2013-07-24 00:45:02.736740 7f8ac1e04700  0 log [INF] : 3.54d deep-scrub ok
> After this time is ok.
>
> It is possible that reason of flapping this osd was scrubbing?
>
> We have default scrubbing settings (ceph version 0.56.6).
> If scrubbig is the trouble-maker, can we make it a little more light by changing config?

It's possible, as deep scrub in particular will add a bit of load (it
goes through and compares the object contents). Are you not having any
flapping issues any more, and did you try and find when it started the
scrub to see if it matched up with your troubles?

I'd be hesitant to turn it off as scrubbing can uncover corrupt
objects etc, but you can configure it with the settings at
http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing.
(Always check the surprisingly-helpful docs when you need to do some
config or operations work!)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html