Hi there!

Has anyone any experience with the Influx Ceph mgr module?

I am using 17.2.7 on CentOS8-Stream, I configured one of my clusters, I test with "ceph influx send" (whereas official doc mentions the non-existing "ceph influx self-test") but nothing goes to the influx databases. Here is my config (password not shown):
mgr  advanced  mgr/influx/database  cephct                      *
mgr  advanced  mgr/influx/hostname  *
mgr  advanced  mgr/influx/interval                    300       *
mgr  advanced  mgr/influx/password                    ****      *
mgr  advanced  mgr/influx/ssl                         false     *
mgr  advanced  mgr/influx/username                    cephctusr *
mgr  advanced  mgr/influx/verify_ssl                  false     *

After enabling the module, in the MGR/MON logs I see, after a while:

2024-02-13T09:06:41.283+0100 7f5be9fff700 0 [influx ERROR root] Queue is full, failed to add chunk

and "ceph health detail" shows:

WRN] MGR_INFLUX_QUEUE_FULL: Failed to chunk to InfluxDB Queue
    Queue is full. InfluxDB might be slow with processing data (edited)

(I searched a bit for "failed to chunk" but found nothing)

MGR coexist with MON, and I verified (by installing influxdb by hand) that from the MON the command influx -database cephct -username cephctusr -password '****' -host
indeed works.
Hmm, actually while making my tests, at some point something arrived to the influxDB server, but only for 5 minutes or so, yesterday morning: it is practically impossible for me now to reconstruct what the configuration was at the time... may be during one server reboot?
In any case, only the following measurements
were populated, and they do not contain terribly exciting metrics, only status of PGs for each pool and number of PG per OSD. I guess the interesting metrics reported in the documentation (latency, bytes, operations...) should end up into some other measurement.

I am not particularly fond of Influx, just seeking for "something"(Influx? Telegraf?) to store metrics and eventually plot to Grafana, to replace the current Zabbix-based solution. I experimented with Prometheus with some satisfaction, some time ago, although it requires a scraper which I'd be happy to avoid, especially given the point below. An additional constraint is that I have at least 3 distinct Ceph production clusters to monitor, so I'd need a way to differentiate them in a simple manner.

How are you dealing with these matters, namely storing configuration and metrics "somewhere"?

Fulvio Galeazzi
GARR-Net Department
tel.: +39-334-6533-250
