Hi everyone, There was a regression in jewel that can trigger long OSD stalls during scrub. How long the stalls are depends on how many objects are in your PGs, how fast your storage device is, and what is cached, but in at least one case they were long enough that the OSD internal heartbeat check failed and it committed suicide (120 seconds). The workaround for now is to simply ceph osd set noscrub as the bug is only triggered by scrub. A fix is being tested and will be available shortly. If you've seen any kind of weird latencies or slow requests on jewel, I suggest setting noscrub and seeing if they go away! The tracker bug is http://tracker.ceph.com/issues/17859 Big thanks to Yoann Moulin for helping track this down! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html