On 08/06/2018 06:24 PM, John Spray wrote: > On Mon, Aug 6, 2018 at 5:04 PM Wido den Hollander <wido@xxxxxxxx> wrote: >> >> Hi, >> >> I'm busy with a customer trying to speed up the Influx and Telegraf >> module to gather statistics of their cluster with 2.000 OSDs. >> >> The problem I'm running into is the performance of the Influx module, >> but this seems to boil down to the Mgr daemon. >> >> Gathering and sending all statistics of the cluster takes about 35 >> seconds with the current code of the Influx module. >> >> By using iterators, queues and multi-threading I was able to bring this >> down to ~20 seconds, but the main problem is this piece of code: >> >> for daemon, counters in six.iteritems(self.get_all_perf_counters()): >> svc_type, svc_id = daemon.split(".", 1) >> metadata = self.get_metadata(svc_type, svc_id) >> >> for path, counter_info in counters.items(): >> if counter_info['type'] & self.PERFCOUNTER_HISTOGRAM: >> continue >> It's a bit difficult to test and would also break backwards compatibility, but I've tried to make it a generator which yields tuples instead of a large dict: if counter_schema['type'] & self.PERFCOUNTER_LONGRUNAVG: v, c = self.get_latest_avg( service['type'], service['id'], counter_path ) counter_info['value'], counter_info['count'] = v, c yield svc_full_name, {counter_path: counter_info} else: counter_info['value'] = self.get_latest( service['type'], service['id'], counter_path ) yield svc_full_name, {counter_path: counter_info} As the main cluster is running Luminous it's harder to test since there have been changes in MgrModule between Luminous and Master, but this seems to improve in the small-scale tests I'm able to do. A small cluster (12 OSDs) is now able to flush to Influx in ~30ms when writing 800 data points. Scaling that up I would be able to flush 100k points in ~5 seconds, but I'm not sure if it would work that way. The tuple if obviously break compatibility, so a C++ implementation might be better, but I'm not very comfortable with that part of the Mgr. Wido >> Gathering all the performance counters and metadata of these 2.000 >> daemons brings to grant total to about 95k data points. >> >> Influx flushes this within just a few seconds, but it takes the Mgr >> daemon a lot more time to spit them out. >> >> I also see that ceph-mgr daemon starts to use a lot of CPU when going >> through this. >> >> The Telegraf module also suffers from this as it uses the same code path >> to fetch these counters. >> >> Is there anything we can do better inside the modules? Or something to >> be improved inside the Mgr? > > There's definitely room to make get_all_perf_counters *much* more > efficient. It's currently issuing individual get_counter() calls into > C++ land for every counter, and get_counter is returning the last N > values into python before get_latest throws away all but the latest. > > I'd suggest implementing a C++ version of get_all_perf_counters. > There will always be some ceiling on how much data is practical in the > "one big endpoint" approach to gathering stats, but if we have a > potential order of magnitude improvement in this call then we should > do it. > > John > >> >> Wido >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html