On Thu, Dec 26, 2013 at 9:17 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > I think the question comes down to whether Ceph should take some internal > action based on the information, or whether that is better handled by some > external monitoring agent. For example, an external agent might collect > SMART info into graphite, and every so often do some predictive analysis > and mark out disks that are expected to fail soon. > > I'd love to see some consensus form around what this should look like... My $.02 from the peanut gallery: at a minimum, set the HEALTH_WARN flag if there is a SMART failure on a physical drive that contains an OSD. Yes, you could build the monitoring into a separate system, but I think it'd be really useful to combine it into the cluster health assessment. -- justin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html