Re: Request for community feedback: Telemetry Performance Channel

Frank Schilder <frans@xxxxxx> · Tue, 11 Jan 2022 14:31:04 +0000

Hi Laura and Greg,

some experience from my side. The idea of the telemetry module is nice, but there is a real problem: its performance is such that for any reasonably-sized and loaded cluster it crashes the MGR. This is at least until mimic. On the ceph courses and recommendations I had/read before deploying our cluster, I always got the advice "disable the telemetry module".

As a consequence, the telemetry data the devs receive are, if any, largely from toy- or small home-storage clusters. I have serious doubts that such skewed data will provide much useful insights into how large clusters perform. When I look at the public dashboard, I see OSD count 74.620 and active clusters 1.609, which gives an average cluster size of 46.4 OSDs. There is only a tiny amount of 54 medium-large clusters (>=2PB). Our cluster is about 11PB with 1051 OSDs. In this category are only 4 listed and I consider this still as a small cluster. In other words, the - for a scale-out storage system - really interesting clusters, the ones much larger than ours, are not sending data.

I'm pretty sure this is due to the unsuitable performance of python call-backs many MGR modules seem to use because its easy to implement. However, its not cheap to execute.

Wouldn't it make sense to address this bottleneck instead of collecting even more not so interesting information? If such scraping workloads were at least distributed over multiple threads or even MGRs. We have 5 MGR instances, only 1 is doing anything in 1 thread and enabling the iostat module already overloads/crashes the MGR.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Laura Flores <lflores@xxxxxxxxxx>
Sent: 14 December 2021 16:29
To: Gregory Farnum
Cc: ceph-users
Subject:  Re: Request for community feedback: Telemetry Performance Channel

Hi Gregory,

It was intentional that I sent this email to the ceph-users list. The
telemetry module is designed as a relationship between developers and
users, where developers decide on the metrics to collect, and users
decide whether or not to opt in. Since the performance channel will be
a new addition to Quincy, it is important that we get feedback from
users while we are still in the development phase, since it will
always be up to the users to decide whether or not they want to share
these metrics with us.

To address your question about heap stats vs. heap dump, you are
correct in your assumption that I meant to say stats:

2. tcmalloc_heap_stats: A dump of tcmalloc heap profiles on a
per-osd basis. These metrics would be derived from the `ceph tell
osd.* heap stats` command.

We are interested in collecting these metrics to detect scenarios
where the osd has freed memory, but the kernel has not reclaimed it
from tcmalloc.

On Mon, Dec 13, 2021 at 6:57 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>
> [ Moved to dev@ceph as this is a technical thing, not user feedback or
> concerns ]
>
> On Mon, Dec 13, 2021 at 1:54 PM Laura Flores <lflores@xxxxxxxxxx> wrote:
> >
> > Dear Ceph users,
> >
> > I'm writing to inform the community about a new performance channel
> > that will be added to the telemetry module in the upcoming Quincy
> > release. Like all other channels, this channel is also on an opt-in
> > basis, but we’d like to know if there are any concerns regarding this
> > new collection and whether users would feel comfortable sharing
> > performance related data with us. Please review the details below and
> > respond to this email with any thoughts or questions you might have.
> >
> > We’ll also discuss this topic live at our next "Ceph User + Dev
> > Monthly Meetup", which will be held this week on December 16th. Feel
> > free to join this meeting and provide direct feedback to developers.
> >
> > U+D meeting details:
> > https://calendar.google.com/calendar/u/0/embed?src=9ts9c7lt7u1vic2ijvvqqlfpo0@xxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > -------------------------------------------------------------------------------------------------------------------------------
> >
> > The telemetry module has been around since Luminous v12.2.13.
> > Operating on a strict opt-in basis, the telemetry module sends
> > anonymous data about Ceph users’ clusters securely back to the Ceph
> > developers. This data, which is displayed on public dashboards [1],
> > helps developers understand how Ceph is used and what problems users
> > may be experiencing. The telemetry module is divided into several
> > channels, each of which collects a different set of information.
> > Existing channels include "basic", "crash", "device", and "ident". All
> > existing channels, as well as future channels, offer users the choice
> > to opt-in. See the latest documentation for more details:
> > https://docs.ceph.com/en/latest/mgr/telemetry/
> >
> > For the upcoming Quincy release, we have designed a new performance
> > channel ("perf") that collects various performance metrics across a
> > cluster. As developers, we would like to use data from the perf
> > channel to:
> >
> >     1. gain a better understanding of how clusters are used
> >     2. discover changes in cluster usage over time
> >     3. identify performance-related bottlenecks
> >     4. model benchmarks used in upstream testing on workload patterns
> > seen in the field
> >     5. suggest better Ceph configuration values based on use case
> >
> > In addition, and most importantly, we have designed the perf channel
> > with users in mind. Our goal is to provide users with better access to
> > detailed performance information about their clusters that they can
> > find all in one place. With this performance data, we aim to provide
> > users the ability to:
> >
> >     1. monitor their own cluster’s performance by daemon type
> >     2. access detailed information about their cluster's overall health
> >     3. identify patterns in their cluster’s workload
> >     4. troubleshoot performance issues in their cluster, e.g. issues
> > with latency, throttling, or memory management
> >
> > In the process of designing the perf channel, we also saw a need for
> > users to be able to view the data they are sending when telemetry is
> > on, as well as the data that is available to send when telemetry is
> > off. With this new design, a user can look at which collections they
> > are reporting when telemetry is on with the command `ceph telemetry
> > show`. If telemetry is off, meaning the user has not opted in to
> > sending data, they can preview a sample report with `ceph telemetry
> > preview`. This same flow can be followed by channel, if preferred:
> > `ceph telemetry show <channel_name>` or `ceph telemetry preview
> > <channel_name>`.
> >
> > In the case of the perf channel, a user who is opted into telemetry
> > (telemetry is on) may view a report of the perf collection with `ceph
> > telemetry show perf`.  A user who is not opted into telemetry
> > (telemetry is off) may view a preview of the perf collection with
> > `ceph telemetry preview perf`.
> >
> >
> > Metrics in the perf channel are reported on an individual
> > daemon/pool/pg basis. As such, the length of the perf report will
> > depend on how many daemons, pools, and pgs a cluster has. We decided
> > to go this route instead of aggregating the metrics since aggregation
> > abstracts the data and makes it difficult to identify problems from
> > individual daemons, pools, and pgs. Metrics that the perf channel
> > collects can be summarized by these categories:
> >
> >     1. perf_counters: All performance counters available to the
> > manager module that have a "USEFUL" priority or higher. Counters are
> > grouped by daemon types (e.g. mds, mon, osd, rgw). These metrics in
> > the perf channel look very similar to the output you would get from
> > the `ceph tell {daemon_type}.* perf dump` command.
> >
> >     2. io_rate: The current change in IOPS done on the Ceph cluster,
> > fetched from the manager module. This includes a delta in pg log size,
> > store stats, and read/write operations. We use these same metrics
> > (num_read, num_read_kb, num_write, num_write_kb) to generate output
> > from the iostat module.
> >
> >     3. osd_perf_histograms: 2d histograms that measure the
> > relationship between latency and request size due to certain
> > read/write operations on OSDs. The histograms collected in the perf
> > channel are derived from the `ceph tell osd.* perf histogram dump`
> > command.
> >
> >     4. stats_per_pg: A dump of IOPS, log size, and scrubbing metrics
> > on a per-pg basis. We fetch this information from the manager module,
> > but the output can also be found in `ceph pg dump`.
> >
> >     5. stats_per_pool: A dump of IOPS, pg log size, and store stats on
> > a per-pool basis. We fetch this information from the manager module,
> > but the output can also be found in `ceph pg dump`. We refrain from
> > collecting pool names here to avoid any sensitive information.
> >
> >     6. mempool: Memory allocations grouped by container on a per-osd
> > basis. The mempool metrics collected in the perf channel are derived
> > from the `ceph tell osd.* dump_mempools` command.
> >
> >
> > We are still in the process of adding these metrics:
> >
> >     1. rocksdb_stats: A dump of metrics used to analyze performance
> > from the RocksDB key-value store, such as compaction time. These
> > metrics will be derived from a new admin socket command that is
> > undergoing review.
> >
> >     2. tcmalloc_heap_stats: A dump of tcmalloc heap profiles on a
> > per-osd basis. These metrics would be derived from the `ceph tell
> > osd.* heap dump` command.
>
> Perhaps you mean the "heap stats" output rather than "heap dump"? I'm
> referencing the docs[1] to make sure i get this right, but "heap
> stats" will disclose the tcmalloc allocation state, plus virtual and
> physical memory in use. "heap dump" will contain the actual function
> calls responsible for malloc() but requires the heap profiler to be
> running, and running the profiler comes at a CPU cost (I think mild?)
> and a memory cost (which can be quite severe). So unfortunately I
> don't think we can expect to obtain that level of detail.
> -Greg
>
> [1]: https://docs.ceph.com/en/pacific/rados/troubleshooting/memory-profiling
>
> >     3. osd_dump_stats: Here, we are mainly interested in collecting
> > the pool applications (e.g. "rbd" or "mgr") so we have a better sense
> > for what purpose each pool is used. If we collect these metrics, we
> > would screen out any sensitive information such as pool names.
> >
> >
> > Attached at the bottom of this email are sample reports (with the perf
> > channel enabled) that we took from our Long Running Cluster (LRC). The
> > LRC’s services include 5 monitors, 3 managers, 3 mds, and 89 osds.
> > Data includes 19 pools and 2,833 pgs. In reviewing this report, you
> > can see the exact metrics that we are collecting in the perf channel,
> > and the exact structure in which those metrics will be presented in
> > the telemetry report.
> >
> > At this point, we’d like to know if there are any concerns regarding
> > the data we plan to include in the performance report and whether
> > users are comfortable sharing it with us.
> >
> > Thanks,
> > Laura Flores
> >
> > [1] Telemetry Public Dashboards -- https://telemetry-public.ceph.com
> > [2] Sample Telemetry Full Report --
> > https://gist.github.com/ljflores/720d32e6d5b8a6f8f42d9eec0428d8da
> > [3] Sample Telemetry Perf Report --
> > https://gist.github.com/ljflores/78a5764dc97d73dd63b341929976ae55
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

--

Laura Flores
She/Her/Hers
Associate Software Engineer, Ceph Storage
Red Hat Inc.
La Grange Park, IL
lflores@xxxxxxxxxx
M: +17087388804

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx