Re: Ceph Monitoring

Trey Palmer <trey@xxxxxxxxxxxxx> · Tue, 17 Jan 2017 19:31:32 -0500

Just going into production now with a large-ish multisite radosgw setup on 10.2.   We are starting off by alerting on anything that isn't HEALTH_OK, just to see how things go.   If we get HEALTH_WARN but no mons or OSD's are down then it will be a low-level alert.   We will massage scripts to pick up on different conditions.
We're using graphite via collectd for visualization.

    -- Trey

On Fri, Jan 13, 2017 at 3:15 PM, Chris Jones <cjones@xxxxxxxxxxx> wrote:
General question/survey:

Those that have larger clusters, how are you doing alerting/monitoring? Meaning, do you trigger off of 'HEALTH_WARN', etc? Not really talking about collectd related but more on initial alerts of an issue or potential issue? What threshold do you use basically? Just trying to get a pulse of what others are doing.

Thanks in advance.  

-- 
Best Regards,Chris Jones
Bloomberg

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com