Re: problem with mgr prometheus module

Dario Graña <dgrana@xxxxxx> · Thu, 6 Jun 2024 08:56:01 +0200

At the moment I've found that the mgr daemon works fine when I move it to an OSD node. All nodes have the same OS version, so I can conclude that the problem is limited to the nodes that normally run mgr. I'm still investigating what's happening, but at least I got the monitoring back.

Regards.

On Tue, Jun 4, 2024 at 4:01 PM Dario Graña <dgrana@xxxxxx> wrote:
Hi all!

I'm running ceph quincy 17.2.7 in a cluster. On monday I updated the OS to AlmaLinux 9.3 to 9.4, since then grafana shows "No Data" message in all ceph related fields but, for example, the nodes information is still fine (Host Detail Dashboard).
I have redeployed the mgr service with cephadm, disabled and re-enabled mgr prometheus module , but nothing changed. Digging into the problem, I accessed the prometheus interface. When I access prometheus, and found this error
When I access the node shown as down, it reports
503 Service Unavailable
No cached data available yet
Traceback (most recent call last):
  File "/lib/python3.6/site-packages/cherrypy/_cprequest.py", line 638, in respond
    self._do_respond(path_info)
  File "/lib/python3.6/site-packages/cherrypy/_cprequest.py", line 697, in _do_respond
    response.body = self.handler()
  File "/lib/python3.6/site-packages/cherrypy/lib/encoding.py", line 219, in __call__
    self.body = self.oldhandler(*args, **kwargs)
  File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/prometheus/module.py", line 1751, in metrics
    return self._metrics(_global_instance)
  File "/usr/share/ceph/mgr/prometheus/module.py", line 1762, in _metrics
    raise cherrypy.HTTPError(503, 'No cached data available yet')
cherrypy._cperror.HTTPError: (503, 'No cached data available yet')
I checked the mgr prometheus address and port
[ceph: root@ceph-admin01 /]# ceph config get mgr mgr/prometheus/server_addr
::
[ceph: root@ceph-admin01 /]# ceph config get mgr mgr/prometheus/server_port
9283

It seems to be ok.

When I check the master manager node for the port, I found
[root@ceph-hn01 ~]# netstat -natup | grep 9283
tcp6       0      0 :::9283                 :::*                    LISTEN      2453/ceph-mgr
tcp6       0      0 192.168.97.51:9283      192.168.97.60:36130     ESTABLISHED 2453/ceph-mgr

I don't understand why it is showing as IPv6, the node doesn't have a dual stack.

I also tried to use a newer version of the prometheus container image, the 1.6.0, but it keeps reporting the same, so I rolled it back to the original one.

Has anyone experienced an issue like this?
Where can I look for more information about it?

Thanks in advance.

Regards.
-- 
Dario Graña
PIC (Port d'Informació Científica)
Campus UAB, Edificio D
E-08193 Bellaterra, Barcelona
http://www.pic.es
Avis - Aviso - Legal Notice: http://legal.ifae.es

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx