Re: Ceph upgrade from 16.2.7 to 17.2.0 using cephadm fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Did the 16.2.7 cluster have a non-root ssh user set and a host with an
_admin label? If so, could you try removing the _admin label from the host
and retrying the upgrade? It sounds like
https://tracker.ceph.com/issues/54620.

Thanks,
  - Adam King

On Fri, Apr 22, 2022 at 7:25 AM Luis Domingues <luis.domingues@xxxxxxxxx>
wrote:

> Hello,
>
> We are testing the upgrade path from ceph 16.2.7 to ceph 17.2.0 on a small
> testing cluster.
>
> Basically, we just bootstrap a ceph cluster with cephadm, make sure we
> have 3 mgrs, 3 mons and 6 osds.
>
> Everytime we try the upgrade using `ceph orch upgrade start ceph-version
> 17.2.0`, we get 2 mgrs to 17.2.0, but the upgrade stops there. We end up
> having:
>
> ```
> ceph versions
> {
> "mon": {
> "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
> (stable)": 3
> },
> "mgr": {
> "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
> (stable)": 1,
> "ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy
> (stable)": 2
> },
> "osd": {
> "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
> (stable)": 6
> },
> "mds": {},
> "overall": {
> "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
> (stable)": 10,
> "ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy
> (stable)": 2
> }
> }
> ```
>
> The error we get from ceph log last cephadm is:
>
> ```
> 2022-04-22T11:14:36.512558+0000 mgr.ip-10-12-0-68.dyvuxt (mgr.34315) 32 :
> cephadm [ERR] executing
> refresh((['ip-10-12-0-15.eu-central-1.compute.internal',
> 'ip-10-12-0-222.eu-central-1.compute.internal',
> 'ip-10-12-0-250.eu-central-1.compute.internal',
> 'ip-10-12-0-68.eu-central-1.compute.internal',
> 'ip-10-12-0-78.eu-central-1.compute.internal',
> 'ip-10-12-0-85.eu-central-1.compute.internal'],)) failed.
> Traceback (most recent call last):
> File "/usr/share/ceph/mgr/cephadm/ssh.py", line 221, in _write_remote_file
> await asyncssh.scp(f.name, (conn, tmp_path))
> File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp
> await source.run(srcpath)
> File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run
> self.handle_error(exc)
> File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
> handle_error
> raise exc from None
> File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run
> await self._send_files(path, b'')
> File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in
> _send_files
> self.handle_error(exc)
> File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
> handle_error
> raise exc from None
> File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in
> _send_files
> await self._send_file(srcpath, dstpath, attrs)
> File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in
> _send_file
> await self._make_cd_request(b'C', attrs, size, srcpath)
> File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in
> _make_cd_request
> self._fs.basename(path))
> File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in
> make_request
> raise exc
> asyncssh.sftp.SFTPFailure: scp: /tmp/etc/ceph/ceph.conf.new: Permission
> denied
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File "/usr/share/ceph/mgr/cephadm/utils.py", line 76, in do_work
> return f(*arg)
> File "/usr/share/ceph/mgr/cephadm/serve.py", line 265, in refresh
> self._write_client_files(client_files, host)
> File "/usr/share/ceph/mgr/cephadm/serve.py", line 1052, in
> _write_client_files
> self.mgr.ssh.write_remote_file(host, path, content, mode, uid, gid)
> File "/usr/share/ceph/mgr/cephadm/ssh.py", line 238, in write_remote_file
> host, path, content, mode, uid, gid, addr))
> File "/usr/share/ceph/mgr/cephadm/module.py", line 569, in wait_async
> return self.event_loop.get_result(coro)
> File "/usr/share/ceph/mgr/cephadm/ssh.py", line 48, in get_result
> return asyncio.run_coroutine_threadsafe(coro, self._loop).result()
> File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in result
> return self.__get_result()
> File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in
> __get_result
> raise self._exception
> File "/usr/share/ceph/mgr/cephadm/ssh.py", line 226, in _write_remote_file
> raise OrchestratorError(msg)orchestrator._interface.OrchestratorError:
> Unable to write
> ip-10-12-0-15.eu-central-1.compute.internal:/etc/ceph/ceph.conf: scp:
> /tmp/etc/ceph/ceph.conf.new: Permission denied
> ```
>
> But if we bootstrap the cluster using 15.2.16 instead of 16.2.7, the
> upgrade just goes perfectly, on the exact same setup.
>
> I do not know what could cause that. Does someone has an idea to help me
> try to find what is going wrong when upgrading a fresh 16.2.7 install?
>
> Thanks
>
> Luis Domingues
> Proton AG
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux