On Mon, Aug 6, 2018 at 5:04 PM Wido den Hollander <wido@xxxxxxxx> wrote: > > Hi, > > I'm busy with a customer trying to speed up the Influx and Telegraf > module to gather statistics of their cluster with 2.000 OSDs. > > The problem I'm running into is the performance of the Influx module, > but this seems to boil down to the Mgr daemon. > > Gathering and sending all statistics of the cluster takes about 35 > seconds with the current code of the Influx module. > > By using iterators, queues and multi-threading I was able to bring this > down to ~20 seconds, but the main problem is this piece of code: > > for daemon, counters in six.iteritems(self.get_all_perf_counters()): > svc_type, svc_id = daemon.split(".", 1) > metadata = self.get_metadata(svc_type, svc_id) > > for path, counter_info in counters.items(): > if counter_info['type'] & self.PERFCOUNTER_HISTOGRAM: > continue > > Gathering all the performance counters and metadata of these 2.000 > daemons brings to grant total to about 95k data points. > > Influx flushes this within just a few seconds, but it takes the Mgr > daemon a lot more time to spit them out. > > I also see that ceph-mgr daemon starts to use a lot of CPU when going > through this. > > The Telegraf module also suffers from this as it uses the same code path > to fetch these counters. > > Is there anything we can do better inside the modules? Or something to > be improved inside the Mgr? There's definitely room to make get_all_perf_counters *much* more efficient. It's currently issuing individual get_counter() calls into C++ land for every counter, and get_counter is returning the last N values into python before get_latest throws away all but the latest. I'd suggest implementing a C++ version of get_all_perf_counters. There will always be some ceiling on how much data is practical in the "one big endpoint" approach to gathering stats, but if we have a potential order of magnitude improvement in this call then we should do it. John > > Wido > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html