Re: Ceph orch commands failing with Error ENOENT: Module not found

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



With the help of a coworker, we were able to get the orchestrator functional again. The issue ended up being a malformed json entry in the node exporter fields and in the Grafana cert/key entries. I had put the Grafana certs in back in v17, but I never touched any of the exporter certs. I don’t know how all of this work as far as suggesting changes, so point me in the right direction and I will make the suggestion that the ceph config-key entries be checked and loaded individually to give the orchestrator the best chance at starting, or at least add some output when debug is on to tell you which keys are actually affected vs just a general error such as “self.known_certs[entity] = json.loads(v)” knowing the value of v would have save me a lot of time.

From: Laimis Juzeliūnas <laimis.juzeliunas@xxxxxxxxxx>
Sent: Sunday, January 12, 2025 8:51 AM
To: Frank Frampton <Frank.Frampton@xxxxxxxxxxxxxx>; zac.dover@xxxxxxxxx
Cc: ceph-users@xxxxxxx
Subject: Re:  Ceph orch commands failing with Error ENOENT: Module not found

Hi Frank,

Seems you are hitting the balancer bug in 19.2 common for larger pg numbers (the same one mentioned in the tracker). There is a fix making its way through final(?) stages of 19.2.1 release.

Unfortunately the only current option is to keep the balancer off and wait for 19.2.1 to arrive.
We managed our way so far with manual/cron balancing using:
https://github.com/laimis9133/plankton-swarm (our own swissknife)
https://github.com/TheJJ/ceph-balancer
With a some amount of https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py


Adding Zac directly here to bring attention once more to the issue:
Users attempting to upgrade to 19.2.0 should be aware of possible balancer issues in the documentation here: https://docs.ceph.com/en/latest/releases/squid/#v19-2-0-squid


Best,
Laimis J.


On 6 Jan 2025, at 21:58, Frank Frampton <Frank.Frampton@xxxxxxxxxxxxxx<mailto:Frank.Frampton@xxxxxxxxxxxxxx>> wrote:

Recent upgrade from 18.2 to 19.2, upgrade went fine. Since the upgrade and a manager fail over, I can no longer run orchestrator commands. The only error I can find on an active manager daemon is the following, or it is the only one that stands out.

2025-01-06T18:48:41.698+0000 7fcf42b99640 -1 mgr load Failed to construct class in 'cephadm'
2025-01-06T18:48:41.698+0000 7fcf42b99640 -1 mgr load Traceback (most recent call last):
 File "/usr/share/ceph/mgr/cephadm/module.py", line 667, in __init__
   self.cert_key_store.load()
 File "/usr/share/ceph/mgr/cephadm/inventory.py", line 2073, in load
   self.known_certs[entity] = json.loads(v)
 File "/lib64/python3.9/json/__init__.py", line 346, in loads
   return _default_decoder.decode(s)
 File "/lib64/python3.9/json/decoder.py", line 337, in decode
   obj, end = self.raw_decode(s, idx=_w(s, 0).end())
 File "/lib64/python3.9/json/decoder.py", line 355, in raw_decode
   raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

2025-01-06T18:48:41.702+0000 7fcf42b99640 -1 mgr operator() Failed to run module in active mode ('cephadm')

For any thing to really work in the dashboard I must have the balancer off. While the balance is off I can make changes in dashboard to the orchestrator, and it doesn't give me any trouble. When trying different commands from a ceph node say "ceph cephadm config-check status" it returns "Error ENOTSUP: Module 'cephadm' is not enabled/loaded (required by command 'cephadm config-check status'): use `ceph mgr module enable cephadm` to enable it". Running "ceph mgr module enable cephadm" returns "module 'cephadm' is already enabled". I really don't know where to look or what to try to resolve this. Any "ceph orch" command results in "Error ENOENT: Module not found"

I don't know that my issue is related to https://www.google.com/url?q=https://tracker.ceph.com/issues/68657&source=gmail-imap&ust=1737301236000000&usg=AOvVaw0nqZZA6yeIwXzuDNuhhkx0, but maybe it is.

I have tried the following.
Manually adding a new mgr daemon on different node, it starts runs the dashboard fine, but things are still not functional.
Failed the mgr several times.
Disabled/Enabled balancer.
Disabled/Enabled mgr modules.
Disabled/Enabled dashboard.

All physical nodes are running Debian 12.




Frank Frampton
Senior Network Services Administrator
Salt Lake City School District
Desk: (801) 578-8223
Follow the district: Facebook<https://www.google.com/url?q=https://www.facebook.com/slcschools&source=gmail-imap&ust=1737301236000000&usg=AOvVaw2u9ap9uRc5kKtF41UFoQ4K> | Instagram<https://www.google.com/url?q=https://instagram.com/slcschools&source=gmail-imap&ust=1737301236000000&usg=AOvVaw2gmnqAAHLKCah05bohs1Aa> | Twitter<https://www.google.com/url?q=https://twitter.com/slcschools&source=gmail-imap&ust=1737301236000000&usg=AOvVaw2_9Yi-af1xrk8_QE1PG4TB> | www.slcschools.org<https://www.google.com/url?q=http://www.slcschools.org/&source=gmail-imap&ust=1737301236000000&usg=AOvVaw3GCyIAzd4YooWiIMbXF6ZG<http://www.slcschools.org%3chttps:/www.google.com/url?q=http://www.slcschools.org/&source=gmail-imap&ust=1737301236000000&usg=AOvVaw3GCyIAzd4YooWiIMbXF6ZG>>
Excellence and Equity: every student, every classroom, every day
Scanned By Microsoft EOP
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>

Scanned By Microsoft EOP
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux