disk failure prediction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Interesting paper at FAST:

	https://www.usenix.org/system/files/conference/fast15/fast15-paper-ma.pdf

Short version: reallocated sectors correllates with impending disk 
failures (this sounds like what Sandon has been telling us for ages) and 
by preemptively replacing disks with impending failures reduced EMC's rate 
of triple-failures by 80%, and looking at the joint failure probability 
within each raid set reduces the failure rate by 98%.  We wouldn't see 
quite the same results since our "raid sets" are effectively entire pools, 
but this seems like a strong case for adding smart monitoring to the osds 
or to calamari already and doing some preemptive disk replacement.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux