Hi, I'm busy with a customer trying to speed up the Influx and Telegraf module to gather statistics of their cluster with 2.000 OSDs. The problem I'm running into is the performance of the Influx module, but this seems to boil down to the Mgr daemon. Gathering and sending all statistics of the cluster takes about 35 seconds with the current code of the Influx module. By using iterators, queues and multi-threading I was able to bring this down to ~20 seconds, but the main problem is this piece of code: for daemon, counters in six.iteritems(self.get_all_perf_counters()): svc_type, svc_id = daemon.split(".", 1) metadata = self.get_metadata(svc_type, svc_id) for path, counter_info in counters.items(): if counter_info['type'] & self.PERFCOUNTER_HISTOGRAM: continue Gathering all the performance counters and metadata of these 2.000 daemons brings to grant total to about 95k data points. Influx flushes this within just a few seconds, but it takes the Mgr daemon a lot more time to spit them out. I also see that ceph-mgr daemon starts to use a lot of CPU when going through this. The Telegraf module also suffers from this as it uses the same code path to fetch these counters. Is there anything we can do better inside the modules? Or something to be improved inside the Mgr? Wido -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html