Re: Monitoring ceph and prometheus

Sage Weil <sage@xxxxxxxxxxxx> · Thu, 11 May 2017 12:47:21 +0000 (UTC)

On Thu, 11 May 2017, John Spray wrote:
> On Thu, May 11, 2017 at 12:52 PM, Jan Fajerski <jfajerski@xxxxxxxx> wrote:
> > Hi list,
> > I recently looked into Ceph monitoring with prometheus. There is already a
> > ceph exporter for this purpose here
> > https://github.com/digitalocean/ceph_exporter.
> >
> > Prometheus encourages software projects to instrument their code directly
> > and expose this data, instead of using an external piece of code. Several
> > libraries are provided for this purpose:
> > https://prometheus.io/docs/instrumenting/clientlibs/
> >
> > I think there are arguments for adding this instrumentation to Ceph
> > directly.  Generally speaking it should reduce overall complexity in the
> > code (no extra exporter component outside of ceph) and in operations (no
> > extra package and configuration).
> >
> > The direct instrumentation could happen in two places:
> > 1)
> > Directly in Cephs C++ code using https://github.com/jupp0r/prometheus-cpp.
> > This would mean daemons expose their metrics directly via the prometheus
> > http interface. This would be the most direct way of exposing metrics,
> > prometheus would simply poll all endpoints. Service discovery for scrape
> > targets, say added or removed OSDS, would however have to be handled
> > somewhere. For orchestration tools à la k8s, ansible, salt, ... either have
> > this feature already or it would be simple enough to add. Deployments not
> > using a tool like that need another approach. Prometheus offer various
> > mechanisms
> > (https://prometheus.io/docs/operating/configuration/#%3Cscrape_config%3E) or
> > a ceph component (say mon or mgr) could handle this.
> >
> > 2)
> > Add a ceph-mgr plugin that exposes the metrics available to ceph-mgr as a
> > prometheus scrape target (using
> > https://github.com/prometheus/client_python).  This would handle the service
> > discovery issue for ceph daemons out of the box (though not for the actual
> > mgr-daemon which is the scrape target). The code would also be in a central
> > location instead of being scattered in several places. It does however add a
> > (maybe pointless) level of indirection ($ceph_daemon -> ceph-mgr ->
> > prometheus) and adds the need for two different scrape intervals (assuming
> > mgr polls metrics from daemons).
> 
> I would love to see a mgr module for prometheus integration!

Me too!  It might make more sense to do it in C++ than python, though, for 
performance reasons.

sage