Re: Cephadm Upgrade from Octopus to Pacific

Peter Childs <pchilds@xxxxxxx> · Fri, 6 Aug 2021 16:22:55 +0100

Yes that works for me, looping through the nodes and doing a is a quick
work around.

ssh $node mkdir /usr/lib/sysctl.d

does allow the upgrade to complete. As I've probably said before I'm
using Debian Buster as CentOS 7 was not 100% happy even with Octopus
(although it just about worked) and CentOs 8 does not support the hardware
I've got (the disks are not detected, and I can't find the right
drivers....)

I suspect I've not got to do some tidying up before I continue but this
does look smoother than when I tried with 16.2.0, which was 4 months ago.

Thanks

Peter.

On Fri, 6 Aug 2021 at 14:40, Dimitri Savineau <dsavinea@xxxxxxxxxx> wrote:

> Looks related to https://tracker.ceph.com/issues/51620
>
> Hopefully this will be backported to Pacific and included in 16.2.6
>
> Regards,
>
> Dimitri
>
> On Fri, Aug 6, 2021 at 9:21 AM Arnaud MARTEL <
> arnaud.martel@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > Peter,
> >
> > I had the same error and my workaround was to manually create
> > /usr/lib/sysctl.d directory on all nodes, then resume the upgrade
> >
> > Arnaud Martel
> > ----- Mail original -----
> > De: "Peter Childs" <pchilds@xxxxxxx>
> > À: "ceph-users" <ceph-users@xxxxxxx>
> > Envoyé: Vendredi 6 Août 2021 15:03:20
> > Objet:  Cephadm Upgrade from Octopus to Pacific
> >
> > I'm attempting to upgrade my large cephadm deployed cluster with 1600 osd
> > from octopus to pacific.
> >
> > Given this did not work very well when I first tried I decided to break
> off
> > 3 nodes and create a small "test" cluster to see how badly it would fail.
> >
> > This is upgrading from 15.2.13 to 16.2.5 with a small 3 node cluster.
> with
> > only 33 osd's (not 42 nodes and 1600 osds)
> >
> > using
> >
> > ceph orch upgrade start --ceph-version 16.2.5
> >
> > so far I've got as far as the first osd failing the mgrs and mons
> upgraded
> > quite quickly, but the very first osd failed.
> >
> > 8/6/21 1:51:19 PM[ERR]Upgrade: Paused due to UPGRADE_REDEPLOY_DAEMON:
> > Upgrading daemon osd.3 on host drywood10 failed.
> >
> > 8/6/21 1:51:19 PM[ERR]cephadm exited with an error code: 1,
> stderr:Redeploy
> > daemon osd.3 ... Traceback (most recent call last): File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 8230, in <module> main() File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 8218, in main r = ctx.func(ctx) File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 1759, in _default_image return func(ctx) File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 4326, in command_deploy ports=daemon_ports) File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 2632, in deploy_daemon c, osd_fsid=osd_fsid, ports=ports) File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 2801, in deploy_daemon_units install_sysctl(ctx, fsid, daemon_type)
> > File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 2963, in install_sysctl _write(conf, lines) File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 2948, in _write with open(conf, 'w') as f: FileNotFoundError: [Errno
> > 2] No such file or directory:
> > '/usr/lib/sysctl.d/90-ceph-43fd7d2e-f693-11eb-990a-a4bf01112a34-osd.conf'
> > Traceback (most recent call last): File
> > "/usr/share/ceph/mgr/cephadm/serve.py", line 1347, in _remote_connection
> > yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line
> 1244,
> > in _run_cephadm code, '\n'.join(err)))
> > orchestrator._interface.OrchestratorError: cephadm exited with an error
> > code: 1, stderr:Redeploy daemon osd.3 ... Traceback (most recent call
> > last): File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 8230, in <module> main() File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 8218, in main r = ctx.func(ctx) File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 1759, in _default_image return func(ctx) File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 4326, in command_deploy ports=daemon_ports) File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 2632, in deploy_daemon c, osd_fsid=osd_fsid, ports=ports) File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 2801, in deploy_daemon_units install_sysctl(ctx, fsid, daemon_type)
> > File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 2963, in install_sysctl _write(conf, lines) File
> >
> >
> "/var/lib/ceph/43fd7d2e-f693-11eb-990a-a4bf01112a34/cephadm.d4237e4639c108308fe13147b1c08af93c3d5724d9ff21ae797eb4b78fea3931",
> > line 2948, in _write with open(conf, 'w') as f: FileNotFoundError: [Errno
> > 2] No such file or directory:
> > '/usr/lib/sysctl.d/90-ceph-43fd7d2e-f693-11eb-990a-a4bf01112a34-osd.conf'
> >
> >
> > The good news this is a pre-production proof of concept cluster still so
> > I'm attempting to iron out issues, before we try and make it a production
> > service.
> >
> > Any ideas would be helpful.
> >
> > I guess deploy might be an option but that does not feel very future
> proof.
> >
> >
> > Thanks
> >
> >
> > Peter Childs
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx