Re: Cephadm - Error ENOENT: Module not found

Adam King <adking@xxxxxxxxxx> · Thu, 30 Mar 2023 11:12:10 -0400

for the specific issue with that traceback, you can probably resolve that
by removing the stored upgrade state. We put it at
`mgr/cephadm/upgrade_state` I believe (can check "ceph config-key ls" and
look for something related to upgrade state if that doesn't work) so
running "ceph config-key rm mgr/cephadm/upgrade_state" should remove the
old one. Then I'd say manually downgrade the mgr daemons to avoid this
happening again (process is roughly the same as
https://docs.ceph.com/en/quincy/cephadm/upgrade/#upgrading-to-a-version-that-supports-staggered-upgrade-from-one-that-doesn-t)
and at that point you should be able to try using an upgrade command again.

On Thu, Mar 30, 2023 at 11:07 AM <elia.oggian@xxxxxxx> wrote:

> Hello,
> After a successful upgrade of a Ceph cluster from 16.2.7 to 16.2.11, I
> needed to downgrade it back to 16.2.7 as I found an issue with the new
> version.
>
> I expected that running the downgrade with:`ceph orch upgrade start
> --ceph-version 16.2.7` should have worked fine. However, it blocked right
> after the downgrade of the first MGR daemon. In fact, the downgraded daemon
> is not able to use the cephadm module anymore. Any `ceph orch` command
> fails with the following error:
>
> ```
> $ ceph orch ps
> Error ENOENT: Module not found
> ```
> And the downgrade process is therefore blocked.
>
> These are the logs of the MGR when issuing the command:
>
> ```
> Mar 28 12:13:15 astano03
> ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]:
> debug 2023-03-28T10:13:15.557+0000 7f828fe8c700  0 log_channel(audit) log
> [DBG] : from='client.3136173 -' entity='client.admin' cmd=[{"prefix": "orch
> ps", "target": ["mon-mgr", ""]}]: dispatch
> Mar 28 12:13:15 astano03
> ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]:
> debug 2023-03-28T10:13:15.558+0000 7f829068d700  0 [orchestrator DEBUG
> root] _oremote orchestrator -> cephadm.list_daemons(*(None, None),
> **{'daemon_id': None, 'host': None, 'refresh': False})
> Mar 28 12:13:15 astano03
> ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]:
> debug 2023-03-28T10:13:15.558+0000 7f829068d700 -1 no module 'cephadm'
> Mar 28 12:13:15 astano03
> ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]:
> debug 2023-03-28T10:13:15.558+0000 7f829068d700  0 [orchestrator DEBUG
> root] _oremote orchestrator -> cephadm.get_feature_set(*(), **{})
> Mar 28 12:13:15 astano03
> ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]:
> debug 2023-03-28T10:13:15.558+0000 7f829068d700 -1 no module 'cephadm'
> Mar 28 12:13:15 astano03
> ceph-c57586c4-8e44-11eb-a116-248a07aa8d2e-mgr-astano03-qtzccn[2232770]:
> debug 2023-03-28T10:13:15.558+0000 7f829068d700 -1 mgr.server reply reply
> (2) No such file or directory Module not found
> ```
>
> Other interesting MGR logs are:
> ```
>  2023-03-28T11:05:59.519+0000 7fcd16314700  4 mgr get_store get_store key:
> mgr/cephadm/upgrade_state
>  2023-03-28T11:05:59.519+0000 7fcd16314700 -1 mgr load Failed to construct
> class in 'cephadm'
>  2023-03-28T11:05:59.519+0000 7fcd16314700 -1 mgr load Traceback (most
> recent call last):
> e "/usr/share/ceph/mgr/cephadm/module.py", line 450, in __init__
> elf.upgrade = CephadmUpgrade(self)
> e "/usr/share/ceph/mgr/cephadm/upgrade.py", line 111, in __init__
> elf.upgrade_state: Optional[UpgradeState] =
> UpgradeState.from_json(json.loads(t))
> e "/usr/share/ceph/mgr/cephadm/upgrade.py", line 92, in from_json
> eturn cls(**c)
> rror: __init__() got an unexpected keyword argument 'daemon_types'
>
>  2023-03-28T11:05:59.521+0000 7fcd16314700 -1 mgr operator() Failed to run
> module in active mode ('cephadm')
> ```
> Which seem to relate to the new feature of staggered upgrades.
>
> Please note that before, everything was working fine with version 16.2.7.
>
> I am currently stuck in this situation with only one MGR daemon on version
> 16.2.11 which is the only one still working fine:
>
> ```
> [root@astano01 ~]# ceph orch ps | grep mgr
> mgr.astano02.mzmewn                    astano02  *:8443,9283  running
> (5d)     43s ago   2y     455M        -  16.2.11  7a63bce27215  e2d7806acf16
> mgr.astano03.qtzccn                    astano03  *:8443,9283  running
> (3m)     22s ago  95m     383M        -  16.2.7   463ec4b1fdc0  cc0d88864fa1
> ```
>
> Does anyone already faced this issue or knows how can I make the 16.2.7
> MGR load the cephadm module correctly?
>
> Thanks in advance for any help!
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx