Multiple Metric Generation Locations in Ceph

Ali Maredia <amaredia@xxxxxxxxxx> · Tue, 27 Feb 2024 09:45:18 -0500

Redouane and Avan came to me with an issue with RGW related metrics
 that warrants a broader community discussion for all daemons. For more 
information, the issue is being tracked by https://tracker.ceph.com/issues/64598

Currently, metrics consumed by Prometheus related to the RGW are being generated by combining two parts:
1.
 The RGW perf counters: these counters are generated by the 
ceph-exporter by parsing the output of the rgw command `ceph counter 
dump`.
2. The RGW metadata (daemon, ceph-version, hostname, etc): this information is generated by the prometheus mgr module.

To combine the two parts ceph-exporter uses a key field called instance_id, which is generated as follows:
1.
 On the ceph-exporter side asok admin socket filename is parsed to 
extract the daemon_id which is used to derive the instance_id.
2. On 
the prometheus-mgr module side orchestrator (cephadm or rook) is called 
to get the daemon_id then instance_id is derived from the daemon_id

This approach/design suffers from the following issues:
1.
 It creates a strong dependency between prometheus-mgr module and the 
orchestrator module (this has already caused issues for Rook 
environments, ceph v18.2.1 metrics are completely broken because of 
this)
2. instance_id on the ceph-exporter side mgmt is weak as it relies on socket filename parsing
3.
 instance_id generation is error-prone as it relies on how daemon_ids 
are handled by the orchestrator module (which is difference between rook
 and cephadm)

The issue for RGW is that with 
certain orchestrators, for example in Rook, there is a mismatch between 
the instance IDs for the metrics emitted by the exporter and the metrics
 from the prometheus manager module.
This has ramifications when 
running queries in Prometheus when the instance id is the primary key 
between the metrics in the queries.

There are many options for solutions, and I'd be happy to hear the community's thoughts about what they think. 

Here are ours (Avan, Redouane, and I):
1. We think daemon specific metrics meant for Prometheus should only be 
emitted from one place, and that place should be the newer 
ceph-exporter.
2. We discussed having a command 
you can run on an admin socket that would emit all of the metadata that 
is currently being sent by the manager module. This way we're not 
relying on parsing file names anymore.
3. promtheus-mgr module will still exist and will be used to emit cluster wise metrics

The
 command could be something like `ceph who-am-i` that you would expect 
to work on any daemons admin socket, or something daemon specific like 
`ceph rgw-info`. 

In other words, move the metadata source from the 
mgr-prometheus module to the ceph-exporter and use this new command 
`ceph who-am-i` to get it. This way, each ceph-daemon will be 
self-sufficient and able to provide the metadata needed to label/tag its
 metrics. 

At this moment this affects at 
least two daemons: rgw and rbd-mirror, but following the approach above 
and by introducing the new generic command we can follow the same 
pattern for other legacy (or new) daemons. 

Look forward to hearing other thoughts,
Ali, Redouane, Avan

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx