Help with setting-up Influx MGR module: ERROR - queue is full

Fulvio Galeazzi <fulvio.galeazzi@xxxxxxx> · Tue, 13 Feb 2024 16:01:05 +0100

Hi there!

Has anyone any experience with the Influx Ceph mgr module?

I am using 17.2.7 on CentOS8-Stream, I configured one of my clusters, I 
test with "ceph influx send" (whereas official doc 
https://docs.ceph.com/en/quincy/mgr/influx/ mentions the non-existing 
"ceph influx self-test") but nothing goes to the influx databases. Here 
is my config (password not shown):
mgr  advanced  mgr/influx/database  cephct                      *
mgr  advanced  mgr/influx/hostname  influxdb-dev.cloud.garr.it  *
mgr  advanced  mgr/influx/interval                    300       *
mgr  advanced  mgr/influx/password                    ****      *
mgr  advanced  mgr/influx/ssl                         false     *
mgr  advanced  mgr/influx/username                    cephctusr *
mgr  advanced  mgr/influx/verify_ssl                  false     *

After enabling the module, in the MGR/MON logs I see, after a while:

2024-02-13T09:06:41.283+0100 7f5be9fff700  0 [influx ERROR root] Queue 
is full, failed to add chunk

and "ceph health detail" shows:

WRN] MGR_INFLUX_QUEUE_FULL: Failed to chunk to InfluxDB Queue
    Queue is full. InfluxDB might be slow with processing data (edited)

(I searched a bit for "failed to chunk" but found nothing)

MGR coexist with MON, and I verified (by installing influxdb by hand) 
that from the MON the command
  influx -database cephct  -username cephctusr -password '****' -host 
influxdb-dev.cloud.garr.it
indeed works.
Hmm, actually while making my tests, at some point something arrived to 
the influxDB server, but only for 5 minutes or so, yesterday morning: it 
is practically impossible for me now to reconstruct what the 
configuration was at the time... may be during one server reboot?
In any case, only the following measurements
    ceph_pg_summary_osd
    ceph_pg_summary_pool
 were populated, and they do not contain terribly exciting metrics, 
only status of PGs for each pool and number of PG per OSD. I guess the 
interesting metrics reported in the documentation (latency, bytes, 
operations...) should end up into some other measurement.

I am not particularly fond of Influx, just seeking for 
"something"(Influx? Telegraf?) to store metrics and eventually plot to 
Grafana, to replace the current Zabbix-based solution.
I experimented with Prometheus with some satisfaction, some time ago, 
although it requires a scraper which I'd be happy to avoid, especially 
given the point below.
An additional constraint is that I have at least 3 distinct Ceph 
production clusters to monitor, so I'd need a way to differentiate them 
in a simple manner.

How are you dealing with these matters, namely storing configuration and 
metrics "somewhere"?

Thanks a lot! (for your patience in reading this, at least)

			Fulvio

--
Fulvio Galeazzi
GARR-Net Department
tel.: +39-334-6533-250
skype: fgaleazzi70
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx