Greetings mailing list! I spent the last 2 months researching and testing the best way to convert over to cephadm from both ceph-ansible and ceph-deploy and this past Sunday I tried to upgrade and convert a cluster. The upgrade from Nautilus to Octopus 15.2.16 went fine after I removed ceph-dashboard. All nodes came up as expected. I then converted Octopus over to cephadm management ( staying on octopus during the conversion ). This particular cluster was an ansible OS prep and Ceph-deploy install. This cluster is a centOS 7 ( updated to the latest build ) and the reason I tried to convert octopus to cephadm managed octopus. After successfully adopting the mons, mgrs and osds, it came time to push out the rgw's. Having played with quincy, I decided to create an rgw service and manually list the nodes. After waiting about 30 minutes, I noticed the gateways were not loaded. I had successfully added the rgw and iscsi hosts just like the mons and osds but for some reason, it wasn't pushing the image to the rgws. When I checked podman on the rgw nodes, there were no containers running. The log didn't show any reason for them not being deployed, so I thought it was the service. I then deleted the newly created rgw service and decided it was best to upgrade to quincy before trying the rgw deployment since I have working cluster that used the cephadm deployment for rgw's. I also noticed several features missing from the octopus dashboard, which backed up my decision to upgrade. I started the upgrade and then noticed the cluster was rebalancing when I checked the status of the upgrade. I wasn't doing that before I started the upgrade, but since it had started that process, I decided the cancel the upgrade to let it finish. This is where the trouble started. Before stopping the upgrade, I checked the upgrade status and saw that it was still blank. I then stopped the upgrade. After running into an issue with the dashboard no longer loading, I then discovered through the versions command that two of the mgrs had upgraded to quincy and the third had not. The monitors were not upgraded on any nodes ( basically just the two mgrs had upgraded ). The upgrade did stop and so I then waited the rebalance to finish and tried to start the upgrade again. Issue at hand The upgrade will not start again. The rgw service while stating it was deleting in the "ceph orch ls" command output, was not deleting. Since two of the mgr's upgraded, I was able to load the new dashboard by failing over manually to the new mgr node. The dashboard will not load the services pages, just states 500 error. After failing over, I cannot run the "ceph orch ls" command, the output is this(same if failed over to 3rd mgr or not): " Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1701, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command return dispatch[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 433, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731 File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper return func(*args, **kwargs) File "/usr/share/ceph/mgr/orchestrator/module.py", line 575, in _list_services services = raise_if_exception(completion) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 228, in raise_if_exception raise e KeyError: 'cephadm' " If I fail back over to the octopus version mgr, the health output changes from health ok to stray daemons and a failed cephadm module notice. The cluster is functional and servicing ok, I just cant seem to get it to do any orchestration. I can certainly blow away the third mgr if need be. I also have two more servers ready to go to make it 5 monitors but the deployment with cephadm of the 4th monitor doesn't work right now through the dashboard. There are 3 mons which are also mgrs., 4 gateways and 12 osd nodes in the cluster. I have two more clusters to upgrade like this, so I am thinking it would be best to jump right to quincy next time instead of messing with the octopus dashboard, just leery of the centos 7 OS possibly causing an issue. I wouldn't think so since this is containers; I have experience with mesosphere and docker clusters. Thoughts on my trainwreck? Many thanks for reading! Regards, -Brent Existing Clusters: Test: Quincy 17.2.0 ( all virtual on nvme ) US Production(HDD): Octopus 15.2.16 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(HDD): Nautilus 14.2.22 with 18 osd servers, 3 mons, 4 gateways, 2 iscsi gateways US Production(SSD): Quincy 17.2.0 Cephadm with 6 osd servers, 5 mons, 4 gateways, 2 iscsi gateways UK Production(SSD): Octopus 15.2.14 with 5 osd servers, 3 mons, 4 gateways _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx