Just to add my 0.02c. I think there are really two layers of alerting - state alerting, and trend based alerting (time series). State alerting is where I'd see a mgr module adding value, where as trend based alerting is more likely to sit outside ceph within prometheus, zabbix, influx etc I also don't think alert management (snoozing, muting etc) should fall to Ceph - let the monitoring/alert layer handle that. This keeps things simple(ish) and helps define the 'alert' role as a health-check and notifier, leaving more advanced controls to higher levels in the monitoring stack. I've been thinking about a "notifier" mgr module to fulfill the state-based alerting, based around the notion of notification channels (similar to Grafana). The idea being that when a problem is seen the notifier calls the send_alert method of the channel, allowing multiple channels to be notified (UI, SNMP, etc) On Wed, Jun 27, 2018 at 10:45 AM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote: > > On Mon, Jun 25, 2018 at 3:55 AM, John Spray <jspray@xxxxxxxxxx> wrote:> Hi all, > > > > Recently I've heard from a few different people about needs to have > > nicer alerting in Ceph, both for GUIs and for emitting alerts > > externally (e.g. over SNMP). I'm keen to make sure we get the right > > common bits in, to avoid modules having to do their own thing too > > much. > > > > Points that have come up recently: > > - How to integrate Ceph health checks with alerts generated in Prometheus? > > - Filtering/muting particular health checks > > +snoozing > > -- > Patrick Donnelly > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html