On Thu, Jun 28, 2018 at 11:49 PM Paul Cuzner <pcuzner@xxxxxxxxxx> wrote: > > Just to add my 0.02c. > > I think there are really two layers of alerting - state alerting, and > trend based alerting (time series). State alerting is where I'd see a > mgr module adding value, where as trend based alerting is more likely > to sit outside ceph within prometheus, zabbix, influx etc > > I also don't think alert management (snoozing, muting etc) should fall > to Ceph - let the monitoring/alert layer handle that. This keeps > things simple(ish) and helps define the 'alert' role as a health-check > and notifier, leaving more advanced controls to higher levels in the > monitoring stack. Part of me feels the same way, but I'm also conscious that there are downsides: if we rely on higher layers to do snoozing, then - it prevents us from building that "snooze" button into the dashboard - the alert will still be active in "ceph status" even if it's filtered somewhere else The main use case I've heard for doing snoozing/muting is that some people monitor the overall HEALTH_OK of their Ceph cluster (not anything finer grained). When there is a health check they don't care about, they want to mute it to get their external monitoring green again. For those people, if we say that any muting is an external job, then we're kind of forcing them to monitor Ceph in a finer level of detail than they really want to. John > > I've been thinking about a "notifier" mgr module to fulfill the > state-based alerting, based around the notion of notification > channels (similar to Grafana). The idea being that when a problem is > seen the notifier calls the send_alert method of the channel, allowing > multiple channels to be notified (UI, SNMP, etc) > On Wed, Jun 27, 2018 at 10:45 AM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote: > > > > On Mon, Jun 25, 2018 at 3:55 AM, John Spray <jspray@xxxxxxxxxx> wrote:> Hi all, > > > > > > Recently I've heard from a few different people about needs to have > > > nicer alerting in Ceph, both for GUIs and for emitting alerts > > > externally (e.g. over SNMP). I'm keen to make sure we get the right > > > common bits in, to avoid modules having to do their own thing too > > > much. > > > > > > Points that have come up recently: > > > - How to integrate Ceph health checks with alerts generated in Prometheus? > > > - Filtering/muting particular health checks > > > > +snoozing > > > > -- > > Patrick Donnelly > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html