Re: Performance of Ceph Mgr and fetching daemon counters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 08/06/2018 06:24 PM, John Spray wrote:
> On Mon, Aug 6, 2018 at 5:04 PM Wido den Hollander <wido@xxxxxxxx> wrote:
>>
>> Hi,
>>
>> I'm busy with a customer trying to speed up the Influx and Telegraf
>> module to gather statistics of their cluster with 2.000 OSDs.
>>
>> The problem I'm running into is the performance of the Influx module,
>> but this seems to boil down to the Mgr daemon.
>>
>> Gathering and sending all statistics of the cluster takes about 35
>> seconds with the current code of the Influx module.
>>
>> By using iterators, queues and multi-threading I was able to bring this
>> down to ~20 seconds, but the main problem is this piece of code:
>>
>>     for daemon, counters in six.iteritems(self.get_all_perf_counters()):
>>         svc_type, svc_id = daemon.split(".", 1)
>>         metadata = self.get_metadata(svc_type, svc_id)
>>
>>         for path, counter_info in counters.items():
>>             if counter_info['type'] & self.PERFCOUNTER_HISTOGRAM:
>>                 continue
>>

It's a bit difficult to test and would also break backwards
compatibility, but I've tried to make it a generator which yields tuples
instead of a large dict:

                if counter_schema['type'] & self.PERFCOUNTER_LONGRUNAVG:
                    v, c = self.get_latest_avg(
                        service['type'],
                        service['id'],
                        counter_path
                    )
                    counter_info['value'], counter_info['count'] = v, c
                    yield svc_full_name, {counter_path: counter_info}
                else:
                    counter_info['value'] = self.get_latest(
                        service['type'],
                        service['id'],
                        counter_path
                    )

                yield svc_full_name, {counter_path: counter_info}

As the main cluster is running Luminous it's harder to test since there
have been changes in MgrModule between Luminous and Master, but this
seems to improve in the small-scale tests I'm able to do.

A small cluster (12 OSDs) is now able to flush to Influx in ~30ms when
writing 800 data points. Scaling that up I would be able to flush 100k
points in ~5 seconds, but I'm not sure if it would work that way.

The tuple if obviously break compatibility, so a C++ implementation
might be better, but I'm not very comfortable with that part of the Mgr.

Wido

>> Gathering all the performance counters and metadata of these 2.000
>> daemons brings to grant total to about 95k data points.
>>
>> Influx flushes this within just a few seconds, but it takes the Mgr
>> daemon a lot more time to spit them out.
>>
>> I also see that ceph-mgr daemon starts to use a lot of CPU when going
>> through this.
>>
>> The Telegraf module also suffers from this as it uses the same code path
>> to fetch these counters.
>>
>> Is there anything we can do better inside the modules? Or something to
>> be improved inside the Mgr?
> 
> There's definitely room to make get_all_perf_counters *much* more
> efficient.  It's currently issuing individual get_counter() calls into
> C++ land for every counter, and get_counter is returning the last N
> values into python before get_latest throws away all but the latest.
> 
> I'd suggest implementing a C++ version of get_all_perf_counters.
> There will always be some ceiling on how much data is practical in the
> "one big endpoint" approach to gathering stats, but if we have a
> potential order of magnitude improvement in this call then we should
> do it.
> 
> John
> 
>>
>> Wido
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux