Re: Module 'devicehealth' has failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I  will provide you any info you need, just gimme a sign.

My starter post was related to 19.2.0. Now I downgraded (full reinstall as this is completely new cluster I wanna run) to 18.2.4 and the same story

Mar 06 09:37:41 node1.ec.mts conmon[10588]: failed to collect metrics:
Mar 06 09:37:41 node1.ec.mts conmon[10588]: Traceback (most recent call last):
Mar 06 09:37:41 node1.ec.mts conmon[10588]:   File "/usr/share/ceph/mgr/prometheus/module.py", line 514, in collect
Mar 06 09:37:41 node1.ec.mts conmon[10588]:     data = self.mod.collect()
Mar 06 09:37:41 node1.ec.mts conmon[10588]:   File "/usr/share/ceph/mgr/mgr_util.py", line 862, in wrapper
Mar 06 09:37:41 node1.ec.mts conmon[10588]:     result = f(*args, **kwargs)
Mar 06 09:37:41 node1.ec.mts conmon[10588]:   File "/usr/share/ceph/mgr/prometheus/module.py", line 1719, in collect
Mar 06 09:37:41 node1.ec.mts conmon[10588]:     self.get_metadata_and_osd_status()
Mar 06 09:37:41 node1.ec.mts conmon[10588]:   File "/usr/share/ceph/mgr/mgr_util.py", line 862, in wrapper
Mar 06 09:37:41 node1.ec.mts conmon[10588]:     result = f(*args, **kwargs)
Mar 06 09:37:41 node1.ec.mts conmon[10588]:   File "/usr/share/ceph/mgr/prometheus/module.py", line 1138, in get_metadata_and_osd_status
Mar 06 09:37:41 node1.ec.mts conmon[10588]:     osd_map = self.get('osd_map')
Mar 06 09:37:41 node1.ec.mts conmon[10588]:   File "/usr/share/ceph/mgr/mgr_module.py", line 1401, in get
Mar 06 09:37:41 node1.ec.mts conmon[10588]:     obj = json.loads(obj)
Mar 06 09:37:41 node1.ec.mts conmon[10588]:   File "/lib64/python3.9/json/__init__.py", line 346, in loads
Mar 06 09:37:41 node1.ec.mts conmon[10588]:     return _default_decoder.decode(s)
Mar 06 09:37:41 node1.ec.mts conmon[10588]:   File "/lib64/python3.9/json/decoder.py", line 337, in decode
Mar 06 09:37:41 node1.ec.mts conmon[10588]:     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
Mar 06 09:37:41 node1.ec.mts conmon[10588]:   File "/lib64/python3.9/json/decoder.py", line 355, in raw_decode
Mar 06 09:37:41 node1.ec.mts conmon[10588]:     raise JSONDecodeError("Expecting value", s, err.value) from None
Mar 06 09:37:41 node1.ec.mts conmon[10588]: json.decoder.JSONDecodeError: Expecting value: line 1 column 2311 (char 2310)


A bit more info, probably it helps somehow. This is a cluster out of 6 nodes by 116 OSD each (696 OSD total). When it was 5 nodes - no error, when 6th appeared - error sprang up. Maybe high OSD denstity give the error?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux