Did the 16.2.7 cluster have a non-root ssh user set and a host with an _admin label? If so, could you try removing the _admin label from the host and retrying the upgrade? It sounds like https://tracker.ceph.com/issues/54620. Thanks, - Adam King On Fri, Apr 22, 2022 at 7:25 AM Luis Domingues <luis.domingues@xxxxxxxxx> wrote: > Hello, > > We are testing the upgrade path from ceph 16.2.7 to ceph 17.2.0 on a small > testing cluster. > > Basically, we just bootstrap a ceph cluster with cephadm, make sure we > have 3 mgrs, 3 mons and 6 osds. > > Everytime we try the upgrade using `ceph orch upgrade start ceph-version > 17.2.0`, we get 2 mgrs to 17.2.0, but the upgrade stops there. We end up > having: > > ``` > ceph versions > { > "mon": { > "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific > (stable)": 3 > }, > "mgr": { > "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific > (stable)": 1, > "ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy > (stable)": 2 > }, > "osd": { > "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific > (stable)": 6 > }, > "mds": {}, > "overall": { > "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific > (stable)": 10, > "ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy > (stable)": 2 > } > } > ``` > > The error we get from ceph log last cephadm is: > > ``` > 2022-04-22T11:14:36.512558+0000 mgr.ip-10-12-0-68.dyvuxt (mgr.34315) 32 : > cephadm [ERR] executing > refresh((['ip-10-12-0-15.eu-central-1.compute.internal', > 'ip-10-12-0-222.eu-central-1.compute.internal', > 'ip-10-12-0-250.eu-central-1.compute.internal', > 'ip-10-12-0-68.eu-central-1.compute.internal', > 'ip-10-12-0-78.eu-central-1.compute.internal', > 'ip-10-12-0-85.eu-central-1.compute.internal'],)) failed. > Traceback (most recent call last): > File "/usr/share/ceph/mgr/cephadm/ssh.py", line 221, in _write_remote_file > await asyncssh.scp(f.name, (conn, tmp_path)) > File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp > await source.run(srcpath) > File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run > self.handle_error(exc) > File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in > handle_error > raise exc from None > File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run > await self._send_files(path, b'') > File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in > _send_files > self.handle_error(exc) > File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in > handle_error > raise exc from None > File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in > _send_files > await self._send_file(srcpath, dstpath, attrs) > File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in > _send_file > await self._make_cd_request(b'C', attrs, size, srcpath) > File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in > _make_cd_request > self._fs.basename(path)) > File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in > make_request > raise exc > asyncssh.sftp.SFTPFailure: scp: /tmp/etc/ceph/ceph.conf.new: Permission > denied > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "/usr/share/ceph/mgr/cephadm/utils.py", line 76, in do_work > return f(*arg) > File "/usr/share/ceph/mgr/cephadm/serve.py", line 265, in refresh > self._write_client_files(client_files, host) > File "/usr/share/ceph/mgr/cephadm/serve.py", line 1052, in > _write_client_files > self.mgr.ssh.write_remote_file(host, path, content, mode, uid, gid) > File "/usr/share/ceph/mgr/cephadm/ssh.py", line 238, in write_remote_file > host, path, content, mode, uid, gid, addr)) > File "/usr/share/ceph/mgr/cephadm/module.py", line 569, in wait_async > return self.event_loop.get_result(coro) > File "/usr/share/ceph/mgr/cephadm/ssh.py", line 48, in get_result > return asyncio.run_coroutine_threadsafe(coro, self._loop).result() > File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in result > return self.__get_result() > File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in > __get_result > raise self._exception > File "/usr/share/ceph/mgr/cephadm/ssh.py", line 226, in _write_remote_file > raise OrchestratorError(msg)orchestrator._interface.OrchestratorError: > Unable to write > ip-10-12-0-15.eu-central-1.compute.internal:/etc/ceph/ceph.conf: scp: > /tmp/etc/ceph/ceph.conf.new: Permission denied > ``` > > But if we bootstrap the cluster using 15.2.16 instead of 16.2.7, the > upgrade just goes perfectly, on the exact same setup. > > I do not know what could cause that. Does someone has an idea to help me > try to find what is going wrong when upgrading a fresh 16.2.7 install? > > Thanks > > Luis Domingues > Proton AG > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx