Hello, since we upgraded to Luminous (12.2.2), we use the internal Ceph
exporter for getting the Ceph metrics to Prometheus. At random
times we get a Internal Server Error from the Ceph exporter, with
python having a key error with some random metric. Often it is
"pg_*". Here is an example of the python exception: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670, in respond response.body = self.handler() File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 217, in __call__ self.body = self.oldhandler(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61, in __call__ return self.callable(*self.args, **self.kwargs) File "/usr/lib/ceph/mgr/prometheus/module.py", line 386, in metrics metrics = global_instance().collect() File "/usr/lib/ceph/mgr/prometheus/module.py", line 324, in collect self.get_pg_status() File "/usr/lib/ceph/mgr/prometheus/module.py", line 266, in get_pg_status self.metrics[path].set(value) KeyError: 'pg_deep' After a certain time (could be 3-5 minutes oder sometimes even 40 minutes), the metric sending starts working again without any help.
Thanks, |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com