Hi, On 12/15/2017 11:53 AM, Falk Mueller-Braun wrote: > since we upgraded to Luminous (12.2.2), we use the internal Ceph > exporter for getting the Ceph metrics to Prometheus. At random times we > get a Internal Server Error from the Ceph exporter, with python having a > key error with some random metric. Often it is "pg_*". > > Here is an example of the python exception: > > Traceback (most recent call last): > File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670, in respond > response.body = self.handler() > File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line 217, in __call__ > self.body = self.oldhandler(*args, **kwargs) > File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61, in __call__ > return self.callable(*self.args, **self.kwargs) > File "/usr/lib/ceph/mgr/prometheus/module.py", line 386, in metrics > metrics = global_instance().collect() > File "/usr/lib/ceph/mgr/prometheus/module.py", line 324, in collect > self.get_pg_status() > File "/usr/lib/ceph/mgr/prometheus/module.py", line 266, in get_pg_status > self.metrics[path].set(value) > KeyError: 'pg_deep' > > After a certain time (could be 3-5 minutes oder sometimes even 40 > minutes), the metric sending starts working again without any help. > > Has anyone got an idea what could be done about that or does experience > similar problems? This seems to be a regression in 12.2.2 - http://tracker.ceph.com/issues/22441 (which is a duplicate of http://tracker.ceph.com/issues/22116) And then there's another one that might be related: http://tracker.ceph.com/issues/22313 Lenz -- SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com