I'm attempting to upgrade my large cephadm deployed cluster with 1600 osd from octopus to pacific. Given this did not work very well when I first tried I decided to break off 3 nodes and create a small "test" cluster to see how badly it would fail. This is upgrading from 15.2.13 to 16.2.5 with a small 3 node cluster. with only 33 osd's (not 42 nodes and 1600 osds) using ceph orch upgrade start --ceph-version 16.2.5 so far I've got as far as the first osd failing the mgrs and mons upgraded quite quickly, but the very first osd failed. 8/6/21 1:51:19 PM[ERR]Upgrade: Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.3 on host drywood10 failed. 8/6/21 1:51:19 PM[ERR]cephadm exited with an error code: 1, stderr:Redeploy daemon osd.3 ... Traceback (most recent call last): File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8230, in <module> main() File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8218, in main r = ctx.func(ctx) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 1759, in _default_image return func(ctx) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 4326, in command_deploy ports=daemon_ports) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2632, in deploy_daemon c, osd_fsid=osd_fsid, ports=ports) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2801, in deploy_daemon_units install_sysctl(ctx, fsid, daemon_type) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2963, in install_sysctl _write(conf, lines) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2948, in _write with open(conf, 'w') as f: FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/sysctl.d/90-ceph-43fd7d2e-f693-11eb-990a-a4bf01112a34-osd.conf' Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1347, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1244, in _run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Redeploy daemon osd.3 ... Traceback (most recent call last): File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8230, in <module> main() File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 8218, in main r = ctx.func(ctx) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 1759, in _default_image return func(ctx) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 4326, in command_deploy ports=daemon_ports) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2632, in deploy_daemon c, osd_fsid=osd_fsid, ports=ports) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2801, in deploy_daemon_units install_sysctl(ctx, fsid, daemon_type) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2963, in install_sysctl _write(conf, lines) File "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931", line 2948, in _write with open(conf, 'w') as f: FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/sysctl.d/90-ceph-43fd7d2e-f693-11eb-990a-a4bf01112a34-osd.conf' The good news this is a pre-production proof of concept cluster still so I'm attempting to iron out issues, before we try and make it a production service. Any ideas would be helpful. I guess deploy might be an option but that does not feel very future proof. Thanks Peter Childs _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx