Re: SMART monitoring

Justin Erenkrantz <justin@xxxxxxxxxxxxxx> · Fri, 27 Dec 2013 11:15:38 -0500



On Thu, Dec 26, 2013 at 9:17 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
> I think the question comes down to whether Ceph should take some internal
> action based on the information, or whether that is better handled by some
> external monitoring agent.  For example, an external agent might collect
> SMART info into graphite, and every so often do some predictive analysis
> and mark out disks that are expected to fail soon.
>
> I'd love to see some consensus form around what this should look like...

My $.02 from the peanut gallery: at a minimum, set the HEALTH_WARN flag if
there is a SMART failure on a physical drive that contains an OSD.  Yes,
you could build the monitoring into a separate system, but I think it'd be
really useful to combine it into the cluster health assessment.  -- justin
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html