Help with setting-up Influx MGR module: ERROR - queue is full

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there!

Has anyone any experience with the Influx Ceph mgr module?

I am using 17.2.7 on CentOS8-Stream, I configured one of my clusters, I test with "ceph influx send" (whereas official doc https://docs.ceph.com/en/quincy/mgr/influx/ mentions the non-existing "ceph influx self-test") but nothing goes to the influx databases. Here is my config (password not shown):
mgr  advanced  mgr/influx/database  cephct                      *
mgr  advanced  mgr/influx/hostname  influxdb-dev.cloud.garr.it  *
mgr  advanced  mgr/influx/interval                    300       *
mgr  advanced  mgr/influx/password                    ****      *
mgr  advanced  mgr/influx/ssl                         false     *
mgr  advanced  mgr/influx/username                    cephctusr *
mgr  advanced  mgr/influx/verify_ssl                  false     *

After enabling the module, in the MGR/MON logs I see, after a while:

2024-02-13T09:06:41.283+0100 7f5be9fff700 0 [influx ERROR root] Queue is full, failed to add chunk

and "ceph health detail" shows:

WRN] MGR_INFLUX_QUEUE_FULL: Failed to chunk to InfluxDB Queue
    Queue is full. InfluxDB might be slow with processing data (edited)

(I searched a bit for "failed to chunk" but found nothing)


MGR coexist with MON, and I verified (by installing influxdb by hand) that from the MON the command influx -database cephct -username cephctusr -password '****' -host influxdb-dev.cloud.garr.it
indeed works.
Hmm, actually while making my tests, at some point something arrived to the influxDB server, but only for 5 minutes or so, yesterday morning: it is practically impossible for me now to reconstruct what the configuration was at the time... may be during one server reboot?
In any case, only the following measurements
    ceph_pg_summary_osd
    ceph_pg_summary_pool
were populated, and they do not contain terribly exciting metrics, only status of PGs for each pool and number of PG per OSD. I guess the interesting metrics reported in the documentation (latency, bytes, operations...) should end up into some other measurement.

I am not particularly fond of Influx, just seeking for "something"(Influx? Telegraf?) to store metrics and eventually plot to Grafana, to replace the current Zabbix-based solution. I experimented with Prometheus with some satisfaction, some time ago, although it requires a scraper which I'd be happy to avoid, especially given the point below. An additional constraint is that I have at least 3 distinct Ceph production clusters to monitor, so I'd need a way to differentiate them in a simple manner.

How are you dealing with these matters, namely storing configuration and metrics "somewhere"?

Thanks a lot! (for your patience in reading this, at least)

			Fulvio


--
Fulvio Galeazzi
GARR-Net Department
tel.: +39-334-6533-250
skype: fgaleazzi70

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux