Before I started the upgrade the cluster was healthy but one
OSD(osd.355) was down, can't remember if it was in or out.
Upgrade was started with
ceph orch upgrade start --image
goharbor.example.com/library/ceph/ceph:v15.2.9
The upgrade started but when Ceph tried to upgrade osd.355 it paused
with the following messages:
2021-03-11T09:15:35.638104+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade:
Target is goharbor.example.com/library/ceph/ceph:v15.2.9 with id
dfc48307963697ff48acd9dd6fda4a7a24017b9d8124f86c2
a542b0802fe77ba
2021-03-11T09:15:35.639882+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade:
Checking mgr daemons...
2021-03-11T09:15:35.644170+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade:
All mgr daemons are up to date.
2021-03-11T09:15:35.644376+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade:
Checking mon daemons...
2021-03-11T09:15:35.647669+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade:
All mon daemons are up to date.
2021-03-11T09:15:35.647866+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade:
Checking crash daemons...
2021-03-11T09:15:35.652035+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade:
Setting container_image for all crash...
2021-03-11T09:15:35.653683+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade:
All crash daemons are up to date.
2021-03-11T09:15:35.653896+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade:
Checking osd daemons...
2021-03-11T09:15:36.273345+0000 mgr.pech-mon-2.cjeiyc [INF] It is
presumed safe to stop ['osd.355']
2021-03-11T09:15:36.273504+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade:
It is presumed safe to stop ['osd.355']
2021-03-11T09:15:36.273887+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade:
Redeploying osd.355
2021-03-11T09:15:36.276673+0000 mgr.pech-mon-2.cjeiyc [ERR] Upgrade:
Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.355 on host
pech-hd-009 failed.
One of the first ting the upgrade did was to upgrade mon, so they are
restarted and now the osd.355 no longer exist
$ ceph osd info osd.355
Error EINVAL: osd.355 does not exist
But if I run a resume
ceph orch upgrade resume
it still tries to upgrade osd.355, same message as above.
I tried to stop and start the upgrade again with
ceph orch upgrade stop
ceph orch upgrade start --image
goharbor.example.com/library/ceph/ceph:v15.2.9
it still tries to upgrade osd.355, with the same message as above.
Looking at the source code it looks like it get daemons to upgrade from
mgr cache, so I restarted both mgr but still it tries to upgrade
osd.355.
Does anyone know how I can get the upgrade to continue?
--
Kai Stian Olstad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx