Prometheus: associating disk+nic metrics with OSDs

John Spray <jspray@xxxxxxxxxx> · Wed, 9 Aug 2017 15:10:12 +0100

Hi folks,

So after much poking around with rules etc, I think I've realised how
one is meant to do this kind of thing.

Prometheus queries have a powerful join-like capability, that can be
used for things like selecting instances running a particular version
of software, without putting the version as a label on everything you
want to filter:
https://www.robustperception.io/exposing-the-software-version-to-prometheus/

We can use that to establish a connection between disks and OSDs by
emitting special metrics like this:
ceph_disk_occupation{instance="mynode456", device="sdc",osd="123"}

Then someone writing a dashboard can get the metrics for the disk for
a known OSD ID with a query like this:
node_disk_bytes_written and on (device,instance) ceph_disk_occupation{osd="123"}

One can also conveniently a metric for all disks that are in use by
any Ceph OSD (but not all disks in the system), by leaving off the
osd=123 part.  I think we can also add rack= labels (or arbitrary
crush ancestry) to these nodes, so that a UI can query their
disk/network stats by crush hierarchy.

There is a constraint that we need the instance= labels to match up
between the metrics describing the OSD disk/nic usage, and the metrics
that come from node_exporter.  By default the instance labels are the
IP/host:port of an exporter, so one has to override them in the
prometheus config to be just the hostname.  That seems to be something
that some people do anyway to have cleaner instance labels in UIs.

I'm going to extend the ceph-mgr prometheus module to emit these
metrics -- to do the same trick for network cards as for disks, we
need the OSDs to tell us their interface names, so I have a PR for
doing that: https://github.com/ceph/ceph/pull/16941

Cheers,
John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html