Re: [ceph-users] stalls caused by scrub on jewel

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Tue, 6 Dec 2016 11:09:33 +0100

Hi Sage,

Could you please clarify: do we need to set nodeep-scrub also, or does
this somehow only affect the (shallow) scrub?

(Note that deep scrubs will start when the deep_scrub_interval has
passed, even with noscrub set).

Cheers, Dan

On Tue, Nov 15, 2016 at 11:35 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> Hi everyone,
>
> There was a regression in jewel that can trigger long OSD stalls during
> scrub.  How long the stalls are depends on how many objects are in your
> PGs, how fast your storage device is, and what is cached, but in at least
> one case they were long enough that the OSD internal heartbeat check
> failed and it committed suicide (120 seconds).
>
> The workaround for now is to simply
>
>  ceph osd set noscrub
>
> as the bug is only triggered by scrub.  A fix is being tested and will be
> available shortly.
>
> If you've seen any kind of weird latencies or slow requests on jewel, I
> suggest setting noscrub and seeing if they go away!
>
> The tracker bug is
>
>  http://tracker.ceph.com/issues/17859
>
> Big thanks to Yoann Moulin for helping track this down!
>
> sage
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html