Hi Paul, Quoting Paul Emmerich (paul.emmerich@xxxxxxxx): > https://static.croit.io/ceph-training-examples/ceph-training-example-admin-socket.pdf Thanks for the link. So, what tool do you use to gather the metrics? We are using telegraf module of the Ceph manager. However, this module only provides "sum" and not "avgtime" so I can't do the calculations. The influx and zabbix mgr modules also only provide "sum". The only metrics module that *does* send "avgtime" is the prometheus module: ceph_mds_reply_latency_sum ceph_mds_reply_latency_count All modules use "self.get_all_perf_counters()" though: ~/git/ceph/src/pybind/mgr/ > grep -Ri get_all_perf_counters * dashboard/controllers/perf_counters.py: return mgr.get_all_perf_counters() diskprediction_cloud/agent/metrics/ceph_mon_osd.py: perf_data = obj_api.module.get_all_perf_counters(services=('mon', 'osd')) influx/module.py: for daemon, counters in six.iteritems(self.get_all_perf_counters()): mgr_module.py: def get_all_perf_counters(self, prio_limit=PRIO_USEFUL, prometheus/module.py: for daemon, counters in self.get_all_perf_counters().items(): restful/api/perf.py: counters = context.instance.get_all_perf_counters() telegraf/module.py: for daemon, counters in six.iteritems(self.get_all_perf_counters()) Besides the *ceph* telegraf module we also use the ceph plugin for telegraf ... but that plugin does not (yet?) provide mds metrics though. Ideally we would *only* use the ceph mgr telegraf module to collect *all the things*. Not sure what's the difference in python code between the modules that could explain this. Gr. Stefan -- | BIT BV https://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com