mgr+Prometheus/grafana (+consul)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I recently configured Prometheus to scrape mgr /metrics and add Grafana
dashboards. All daemons at 15.2.11

I use Hashicorp consul to advertise the active mgr in DNS, and Prometheus
points at a single DNS target. (Is anyone else using this method, or just
statically pointing Prometheus at all potentially active managers?)

All was working fine initially, and it's *mostly* still working fine. For
the first couple of days, all went well, and then a few rate metrics
stopped meaningfully increasing — essentially pegged at zero, which is
implausible in a healthy cluster. Some cluster maintenance was occurring
such as outing and recreating some OSDs, so I have a baseline for
throughput and recovery.

Metric graphs that stopped functioning:
Throughput: ceph_osd_op_r_out_bytes, ceph_osd_op_w_in_bytes,
ceph_osd_op_rw_in_bytes
Recovery: ceph_osd_recovery_ops

I can see that Grafana output is using this method of converting the
counters to rates:
sum(irate(ceph_osd_recovery_ops{job="$job"}[$interval]))

The underlying counters appear to be sane, and reading the raw values from
prometheus is also valid, so I'm guessing some failure either of the irate
or sum functions? By inspection in Grafana, the queries return correct
timestamps with zero values, so that leaves us with "sum(irate)" as the
likely source of the problem.

 Does anyone have experience with this? I admit it is possibly tangential
to ceph itself, but as the Prometheus/grafana integration is more or less
supported, I thought I'd try here first.


-- 
Jeremy Austin
jhaustin@xxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux