Re: Multiple Metric Generation Locations in Ceph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 27, 2024 at 3:46 PM Ali Maredia <amaredia@xxxxxxxxxx> wrote:
>
> Redouane and Avan came to me with an issue with RGW related metrics that warrants a broader community discussion for all daemons. For more information, the issue is being tracked by https://tracker.ceph.com/issues/64598
>
> Currently, metrics consumed by Prometheus related to the RGW are being generated by combining two parts:
> 1. The RGW perf counters: these counters are generated by the ceph-exporter by parsing the output of the rgw command `ceph counter dump`.
> 2. The RGW metadata (daemon, ceph-version, hostname, etc): this information is generated by the prometheus mgr module.
>
> To combine the two parts ceph-exporter uses a key field called instance_id, which is generated as follows:
> 1. On the ceph-exporter side asok admin socket filename is parsed to extract the daemon_id which is used to derive the instance_id.
> 2. On the prometheus-mgr module side orchestrator (cephadm or rook) is called to get the daemon_id then instance_id is derived from the daemon_id
>
> This approach/design suffers from the following issues:
> 1. It creates a strong dependency between prometheus-mgr module and the orchestrator module (this has already caused issues for Rook environments, ceph v18.2.1 metrics are completely broken because of this)
> 2. instance_id on the ceph-exporter side mgmt is weak as it relies on socket filename parsing
> 3. instance_id generation is error-prone as it relies on how daemon_ids are handled by the orchestrator module (which is difference between rook and cephadm)
>
> The issue for RGW is that with certain orchestrators, for example in Rook, there is a mismatch between the instance IDs for the metrics emitted by the exporter and the metrics from the prometheus manager module.
> This has ramifications when running queries in Prometheus when the instance id is the primary key between the metrics in the queries.
>
> There are many options for solutions, and I'd be happy to hear the community's thoughts about what they think.
>
> Here are ours (Avan, Redouane, and I):
> 1. We think daemon specific metrics meant for Prometheus should only be emitted from one place, and that place should be the newer ceph-exporter.
> 2. We discussed having a command you can run on an admin socket that would emit all of the metadata that is currently being sent by the manager module. This way we're not relying on parsing file names anymore.
> 3. promtheus-mgr module will still exist and will be used to emit cluster wise metrics

Hi Ali,

+1

>
> The command could be something like `ceph who-am-i` that you would expect to work on any daemons admin socket, or something daemon specific like `ceph rgw-info`.

IIRC both OSD and MDS daemons return some high-level specific
information with "status" command.  RGW could do the same.

The version is already returned with "version" command universally.

For other information needed by ceph-exporter to decorate metrics with
common metadata, I think a new command is warranted.  The output format
should be the same for all daemons and the code that would parse it in
ceph-exporter should be daemon-agnostic.

Thanks,

                Ilya
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux