On Thu, 11 May 2017, John Spray wrote: > On Thu, May 11, 2017 at 12:52 PM, Jan Fajerski <jfajerski@xxxxxxxx> wrote: > > Hi list, > > I recently looked into Ceph monitoring with prometheus. There is already a > > ceph exporter for this purpose here > > https://github.com/digitalocean/ceph_exporter. > > > > Prometheus encourages software projects to instrument their code directly > > and expose this data, instead of using an external piece of code. Several > > libraries are provided for this purpose: > > https://prometheus.io/docs/instrumenting/clientlibs/ > > > > I think there are arguments for adding this instrumentation to Ceph > > directly. Generally speaking it should reduce overall complexity in the > > code (no extra exporter component outside of ceph) and in operations (no > > extra package and configuration). > > > > The direct instrumentation could happen in two places: > > 1) > > Directly in Cephs C++ code using https://github.com/jupp0r/prometheus-cpp. > > This would mean daemons expose their metrics directly via the prometheus > > http interface. This would be the most direct way of exposing metrics, > > prometheus would simply poll all endpoints. Service discovery for scrape > > targets, say added or removed OSDS, would however have to be handled > > somewhere. For orchestration tools à la k8s, ansible, salt, ... either have > > this feature already or it would be simple enough to add. Deployments not > > using a tool like that need another approach. Prometheus offer various > > mechanisms > > (https://prometheus.io/docs/operating/configuration/#%3Cscrape_config%3E) or > > a ceph component (say mon or mgr) could handle this. > > > > 2) > > Add a ceph-mgr plugin that exposes the metrics available to ceph-mgr as a > > prometheus scrape target (using > > https://github.com/prometheus/client_python). This would handle the service > > discovery issue for ceph daemons out of the box (though not for the actual > > mgr-daemon which is the scrape target). The code would also be in a central > > location instead of being scattered in several places. It does however add a > > (maybe pointless) level of indirection ($ceph_daemon -> ceph-mgr -> > > prometheus) and adds the need for two different scrape intervals (assuming > > mgr polls metrics from daemons). > > I would love to see a mgr module for prometheus integration! Me too! It might make more sense to do it in C++ than python, though, for performance reasons. sage