Re: units of metrics

Stefan Kooman <stefan@xxxxxx> · Thu, 12 Sep 2019 17:00:40 +0200

Hi Paul,

Quoting Paul Emmerich (paul.emmerich@xxxxxxxx):
> https://static.croit.io/ceph-training-examples/ceph-training-example-admin-socket.pdf

Thanks for the link. So, what tool do you use to gather the metrics? We
are using telegraf module of the Ceph manager. However, this module only
provides "sum" and not "avgtime" so I can't do the calculations. The
influx and zabbix mgr modules also only provide "sum". The only metrics
module that *does* send "avgtime" is the prometheus module:

ceph_mds_reply_latency_sum
ceph_mds_reply_latency_count

All modules use "self.get_all_perf_counters()" though:

~/git/ceph/src/pybind/mgr/ > grep -Ri get_all_perf_counters *
dashboard/controllers/perf_counters.py:        return mgr.get_all_perf_counters()
diskprediction_cloud/agent/metrics/ceph_mon_osd.py:        perf_data = obj_api.module.get_all_perf_counters(services=('mon', 'osd'))
influx/module.py:        for daemon, counters in six.iteritems(self.get_all_perf_counters()):
mgr_module.py:    def get_all_perf_counters(self, prio_limit=PRIO_USEFUL,
prometheus/module.py:        for daemon, counters in self.get_all_perf_counters().items():
restful/api/perf.py:        counters = context.instance.get_all_perf_counters()
telegraf/module.py:        for daemon, counters in six.iteritems(self.get_all_perf_counters())

Besides the *ceph* telegraf module we also use the ceph plugin for
telegraf ... but that plugin does not (yet?) provide mds metrics though.
Ideally we would *only* use the ceph mgr telegraf module to collect *all
the things*.

Not sure what's the difference in python code between the modules that could explain this.

Gr. Stefan

-- 
| BIT BV  https://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com