Ceph metric exporter HTTP Error 500

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

since we upgraded to Luminous (12.2.2), we use the internal Ceph exporter for getting the Ceph metrics to Prometheus. At random times we get a Internal Server Error from the Ceph exporter, with python having a key error with some random metric. Often it is "pg_*".

Here is an example of the python exception:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670, in respond
    response.body = self.handler()
  File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 217, in __call__
    self.body = self.oldhandler(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61, in __call__
    return self.callable(*self.args, **self.kwargs)
  File "/usr/lib/ceph/mgr/prometheus/module.py", line 386, in metrics
    metrics = global_instance().collect()
  File "/usr/lib/ceph/mgr/prometheus/module.py", line 324, in collect
    self.get_pg_status()
  File "/usr/lib/ceph/mgr/prometheus/module.py", line 266, in get_pg_status
    self.metrics[path].set(value)
KeyError: 'pg_deep'

After a certain time (could be 3-5 minutes oder sometimes even 40 minutes), the metric sending starts working again without any help.


Has anyone got an idea what could be done about that or does experience similar problems?

Thanks,
Falk

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux