Re: Monitoring ceph and prometheus

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 12 May 2017 01:07:19 +0000 (UTC)

On Fri, 12 May 2017, Brad Hubbard wrote:
> On Thu, May 11, 2017 at 10:47 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > On Thu, 11 May 2017, John Spray wrote:
> >> On Thu, May 11, 2017 at 12:52 PM, Jan Fajerski <jfajerski@xxxxxxxx> wrote:
> >> > Hi list,
> >> > I recently looked into Ceph monitoring with prometheus. There is already a
> >> > ceph exporter for this purpose here
> >> > https://github.com/digitalocean/ceph_exporter.
> >> >
> >> > Prometheus encourages software projects to instrument their code directly
> >> > and expose this data, instead of using an external piece of code. Several
> >> > libraries are provided for this purpose:
> >> > https://prometheus.io/docs/instrumenting/clientlibs/
> >> >
> >> > I think there are arguments for adding this instrumentation to Ceph
> >> > directly.  Generally speaking it should reduce overall complexity in the
> >> > code (no extra exporter component outside of ceph) and in operations (no
> >> > extra package and configuration).
> >> >
> >> > The direct instrumentation could happen in two places:
> >> > 1)
> >> > Directly in Cephs C++ code using https://github.com/jupp0r/prometheus-cpp.
> >> > This would mean daemons expose their metrics directly via the prometheus
> >> > http interface. This would be the most direct way of exposing metrics,
> >> > prometheus would simply poll all endpoints. Service discovery for scrape
> >> > targets, say added or removed OSDS, would however have to be handled
> >> > somewhere. For orchestration tools à la k8s, ansible, salt, ... either have
> >> > this feature already or it would be simple enough to add. Deployments not
> >> > using a tool like that need another approach. Prometheus offer various
> >> > mechanisms
> >> > (https://prometheus.io/docs/operating/configuration/#%3Cscrape_config%3E) or
> >> > a ceph component (say mon or mgr) could handle this.
> >> >
> >> > 2)
> >> > Add a ceph-mgr plugin that exposes the metrics available to ceph-mgr as a
> >> > prometheus scrape target (using
> >> > https://github.com/prometheus/client_python).  This would handle the service
> >> > discovery issue for ceph daemons out of the box (though not for the actual
> >> > mgr-daemon which is the scrape target). The code would also be in a central
> >> > location instead of being scattered in several places. It does however add a
> >> > (maybe pointless) level of indirection ($ceph_daemon -> ceph-mgr ->
> >> > prometheus) and adds the need for two different scrape intervals (assuming
> >> > mgr polls metrics from daemons).
> >>
> >> I would love to see a mgr module for prometheus integration!
> >
> > Me too!  It might make more sense to do it in C++ than python, though, for
> > performance reasons.
> 
> Can we define "metrics" here? What, specifically, are we planning to gather?
> 
> Let's start with an example from "ceph_exporter". It exposes a metric
> ApplyLatency which it obtains by connecting to the cluster via a rados client
> connection and running the "osd perf" command and gathering the apply_latency_ms
> result. I believe this stat is the equivalent of the apply_latency perf counters
> statistic.
> 
> Does the manager currently export the performance counters? If not option 1 is
> looking more viable for gathering these sorts (think "perf dump") of
> metrics unless the manager can proxy calls such as "osd perf" back to the MONs?

Right now all of the perfcounters are reported to ceph-mgr.  We shouldn't 
need to do 'osd perf' (which is just reporting those 2 metrics that the 
osds have historically reported to the mon).

> Part of the problem with gathering metrics from ceph is working out what set of
> metrics you want to collect from a large assortment available IMHO.

We could collect them all. Or, we recently introduced a 'priority' field 
so we can collect everything above a threshold (although then we have to 
go assign meaningful priorities to most of the counters).

BTW one of the cool things about prometheus is that it has a histogram 
type, which means we can take our 2d histogram data and report that 
(flattened into one or the other dimension).

sage