Having just discussed this topic with Sage, it was pointed out that this functionality should (possibly will) be a core component of Ceph. I've put something together with a custom Nagios plug-in, Icinga2, InfluxDB and Grafana which appears to work rather reliably. Blog post available at http://www.datacentred.co.uk/blog/integrating-icinga2-with-influxdb-and-grafana/ for inspiration.
In the interests of sharing and caring it'd be nice if other operators could offer some insight into their best practices for spotting when things are about to go wrong so we all can be proactive about maintaining a stable service.DataCentred Limited registered in England and Wales no. 05611763
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com