Re: Cephadm - Error ENOENT: Module not found

elia.oggian@xxxxxxx · Thu, 06 Apr 2023 16:38:10 -0000

Hi Adam, sorry for the very late reply. 

I also found out that the "mgr/cephadm/upgrade_state" config key was the issue. I actually just modified the config key and removed the unknown fields. This made "ceph orch" commands work again. Great.

However, the downgrade process was quickly stuck on another step, with a similar message:

```
2023-03-28T16:05:40.820967+0200 mgr.naret-monitor02.ciqvgv [WRN] unable to load cached state for naret-monitor01: DaemonDescription: __init__() got an unexpected keyword argument 'cpu_percentage'
2023-03-28T16:05:40.821262+0200 mgr.naret-monitor02.ciqvgv [WRN] unable to load cached state for naret-monitor02: DaemonDescription: __init__() got an unexpected keyword argument 'cpu_percentage'
2023-03-28T16:05:40.821499+0200 mgr.naret-monitor02.ciqvgv [WRN] unable to load cached state for naret-monitor03: DaemonDescription: __init__() got an unexpected keyword argument 'cpu_percentage'
2023-03-28T16:05:40.822704+0200 mgr.naret-monitor02.ciqvgv [WRN] unable to load cached state for naret-osd01: DaemonDescription: __init__() got an unexpected keyword argument 'cpu_percentage'
2023-03-28T16:05:40.823935+0200 mgr.naret-monitor02.ciqvgv [WRN] unable to load cached state for naret-osd02: DaemonDescription: __init__() got an unexpected keyword argument 'cpu_percentage'
2023-03-28T16:05:40.825008+0200 mgr.naret-monitor02.ciqvgv [WRN] unable to load cached state for naret-osd03: DaemonDescription: __init__() got an unexpected keyword argument 'cpu_percentage'
```

I did not find a solution to this, and as the downgrade process was again stuck, we finally decided to move forward and try a major upgrade to Quincy, version 17.2.5.

The upgrade succeeded and everything is working fine now. 

It seems that the downgrade is not supported and, unfortunately, testing the upgrades in a test environment is not sufficient. We tested the upgrade on a small test cluster, and everything went fine, but when we did the upgrade on a much bigger production cluster  this issue showed up: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/53NNVLLKLMYVNF32IP4CHRPEIPTMKMLX/#53NNVLLKLMYVNF32IP4CHRPEIPTMKMLX  
and we were therefore blocked.

Luckily we found a solution in the end :)  

Thanks anyway for your help!

Best Regards,
Elia Oggian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx