Re: Improving alerting/health checks

Paul Cuzner <pcuzner@xxxxxxxxxx> · Fri, 29 Jun 2018 10:49:15 +1200

Just to add my 0.02c.

I think there are really two layers of alerting - state alerting, and
trend based alerting (time series). State alerting is where I'd see a
mgr module adding value, where as trend based alerting is more likely
to sit outside ceph within prometheus, zabbix, influx etc

I also don't think alert management (snoozing, muting etc) should fall
to Ceph - let the monitoring/alert layer handle that. This keeps
things simple(ish) and helps define the 'alert' role as a health-check
and notifier, leaving more advanced controls to higher levels in the
monitoring stack.

I've been thinking about a "notifier" mgr module to fulfill the
state-based alerting, based around the notion of  notification
channels (similar to Grafana). The idea being that when a problem is
seen the notifier calls the send_alert method of the channel, allowing
multiple channels to be notified (UI, SNMP, etc)
On Wed, Jun 27, 2018 at 10:45 AM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
>
> On Mon, Jun 25, 2018 at 3:55 AM, John Spray <jspray@xxxxxxxxxx> wrote:> Hi all,
> >
> > Recently I've heard from a few different people about needs to have
> > nicer alerting in Ceph, both for GUIs and for emitting alerts
> > externally (e.g. over SNMP).  I'm keen to make sure we get the right
> > common bits in, to avoid modules having to do their own thing too
> > much.
> >
> > Points that have come up recently:
> >  - How to integrate Ceph health checks with alerts generated in Prometheus?
> >  - Filtering/muting particular health checks
>
> +snoozing
>
> --
> Patrick Donnelly
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html