Hi, I didn’t notice anything suspicious in mgr logs, neither in the cephadm.log one (attaching an extract of the latest). What I have noticed is that one the mgr container, the active one, gets restarted about every 3 minutes.... (as reported by ceph -w) """ 2022-05-18T15:30:49.883238+0200 mon.naret-monitor01 [INF] Active manager daemon naret-monitor01.tvddjv restarted 2022-05-18T15:30:49.889294+0200 mon.naret-monitor01 [INF] Activating manager daemon naret-monitor01.tvddjv 2022-05-18T15:30:50.832200+0200 mon.naret-monitor01 [INF] Manager daemon naret-monitor01.tvddjv is now available 2022-05-18T15:34:16.979735+0200 mon.naret-monitor01 [INF] Active manager daemon naret-monitor01.tvddjv restarted 2022-05-18T15:34:16.985531+0200 mon.naret-monitor01 [INF] Activating manager daemon naret-monitor01.tvddjv 2022-05-18T15:34:18.246784+0200 mon.naret-monitor01 [INF] Manager daemon naret-monitor01.tvddjv is now available 2022-05-18T15:37:34.576159+0200 mon.naret-monitor01 [INF] Active manager daemon naret-monitor01.tvddjv restarted 2022-05-18T15:37:34.582935+0200 mon.naret-monitor01 [INF] Activating manager daemon naret-monitor01.tvddjv 2022-05-18T15:37:35.821200+0200 mon.naret-monitor01 [INF] Manager daemon naret-monitor01.tvddjv is now available 2022-05-18T15:40:00.000148+0200 mon.naret-monitor01 [INF] overall HEALTH_OK 2022-05-18T15:40:52.456182+0200 mon.naret-monitor01 [INF] Active manager daemon naret-monitor01.tvddjv restarted 2022-05-18T15:40:52.461826+0200 mon.naret-monitor01 [INF] Activating manager daemon naret-monitor01.tvddjv 2022-05-18T15:40:53.787353+0200 mon.naret-monitor01 [INF] Manager daemon naret-monitor01.tvddjv is now available """ Attaching also the active mgr proc logs. The cluster is working fine, but I wonder if this behaviour of mgr/cephadm is itself wrong and might cause the stall of the upgrade. Thanks, Giuseppe On 18.05.22, 14:19, "Eugen Block" <eblock@xxxxxx> wrote: Do you see anything suspicious in /var/log/ceph/cephadm.log? Also check the mgr logs for any hints. Zitat von Lo Re Giuseppe <giuseppe.lore@xxxxxxx>: > Hi, > > We have happily tested the upgrade from v15.2.16 to v16.2.7 with > cephadm on a test cluster made of 3 nodes and everything went > smoothly. > Today we started the very same operation on the production one (20 > OSD servers, 720 HDDs) and the upgrade process doesn’t do anything > at all… > > To be more specific, we have issued the command > > ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.7 > > and soon after “ceph -s” reports > > Upgrade to quay.io/ceph/ceph:v16.2.7 (0s) > [............................] > > But only for few seconds, after that > > [root@naret-monitor01 ~]# ceph -s > cluster: > id: 63334166-d991-11eb-99de-40a6b72108d0 > health: HEALTH_OK > > services: > mon: 3 daemons, quorum > naret-monitor01,naret-monitor02,naret-monitor03 (age 7d) > mgr: naret-monitor01.tvddjv(active, since 60s), standbys: > naret-monitor02.btynnb > mds: cephfs:1 {0=cephfs.naret-monitor01.uvevbf=up:active} 2 up:standby > osd: 760 osds: 760 up (since 6d), 760 in (since 2w) > rgw: 3 daemons active (cscs-realm.naret-zone.naret-rgw01.qvhhbi, > cscs-realm.naret-zone.naret-rgw02.pduagk, > cscs-realm.naret-zone.naret-rgw03.aqdkkb) > > task status: > > data: > pools: 30 pools, 16497 pgs > objects: 833.14M objects, 3.1 PiB > usage: 5.0 PiB used, 5.9 PiB / 11 PiB avail > pgs: 16460 active+clean > 37 active+clean+scrubbing+deep > > io: > client: 4.7 MiB/s rd, 4.0 MiB/s wr, 122 op/s rd, 47 op/s wr > > progress: > Removing image fulen-hdd/c991f6fdf41964 from trash (53s) > [............................] (remaining: 81m) > > > > The command “ceph orch upgrade status” says: > > { > "target_image": "quay.io/ceph/ceph:v16.2.7", > "in_progress": true, > "services_complete": [], > "message": "" > } > > It doesn’t even pull the container image. > I have tested that the podman pull command works, I was able to pull > quay.io/ceph/ceph:v16.2.7. > > “ceph -w” and “ceph -W cephadm” don’t report any activity related to > the upgrade. > > > Does anyone have seen anything similar? > Do you have advises on how to understand what’s holding the upgrade > process to actually start? > > Thanks in advance, > > Giuseppe > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx