Interesting paper at FAST: https://www.usenix.org/system/files/conference/fast15/fast15-paper-ma.pdf Short version: reallocated sectors correllates with impending disk failures (this sounds like what Sandon has been telling us for ages) and by preemptively replacing disks with impending failures reduced EMC's rate of triple-failures by 80%, and looking at the joint failure probability within each raid set reduces the failure rate by 98%. We wouldn't see quite the same results since our "raid sets" are effectively entire pools, but this seems like a strong case for adding smart monitoring to the osds or to calamari already and doing some preemptive disk replacement. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html