Hi there! Has anyone any experience with the Influx Ceph mgr module?I am using 17.2.7 on CentOS8-Stream, I configured one of my clusters, I test with "ceph influx send" (whereas official doc https://docs.ceph.com/en/quincy/mgr/influx/ mentions the non-existing "ceph influx self-test") but nothing goes to the influx databases. Here is my config (password not shown):
mgr advanced mgr/influx/database cephct * mgr advanced mgr/influx/hostname influxdb-dev.cloud.garr.it * mgr advanced mgr/influx/interval 300 * mgr advanced mgr/influx/password **** * mgr advanced mgr/influx/ssl false * mgr advanced mgr/influx/username cephctusr * mgr advanced mgr/influx/verify_ssl false * After enabling the module, in the MGR/MON logs I see, after a while:2024-02-13T09:06:41.283+0100 7f5be9fff700 0 [influx ERROR root] Queue is full, failed to add chunk
and "ceph health detail" shows: WRN] MGR_INFLUX_QUEUE_FULL: Failed to chunk to InfluxDB Queue Queue is full. InfluxDB might be slow with processing data (edited) (I searched a bit for "failed to chunk" but found nothing)MGR coexist with MON, and I verified (by installing influxdb by hand) that from the MON the command influx -database cephct -username cephctusr -password '****' -host influxdb-dev.cloud.garr.it
indeed works.Hmm, actually while making my tests, at some point something arrived to the influxDB server, but only for 5 minutes or so, yesterday morning: it is practically impossible for me now to reconstruct what the configuration was at the time... may be during one server reboot?
In any case, only the following measurements ceph_pg_summary_osd ceph_pg_summary_poolwere populated, and they do not contain terribly exciting metrics, only status of PGs for each pool and number of PG per OSD. I guess the interesting metrics reported in the documentation (latency, bytes, operations...) should end up into some other measurement.
I am not particularly fond of Influx, just seeking for "something"(Influx? Telegraf?) to store metrics and eventually plot to Grafana, to replace the current Zabbix-based solution. I experimented with Prometheus with some satisfaction, some time ago, although it requires a scraper which I'd be happy to avoid, especially given the point below. An additional constraint is that I have at least 3 distinct Ceph production clusters to monitor, so I'd need a way to differentiate them in a simple manner.
How are you dealing with these matters, namely storing configuration and metrics "somewhere"?
Thanks a lot! (for your patience in reading this, at least) Fulvio -- Fulvio Galeazzi GARR-Net Department tel.: +39-334-6533-250 skype: fgaleazzi70
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx