Dear Eugen, I have removed one mgr on the node 3 , the second one is still crashlooping and on node 1 mgr is in 16.2.2 Not sure to understand your workaround. * Stopping current upgrade to rollback if possible and afterward upgrading to latest release of pacific ? Best Regards, Edouard FAZENDA Technical Support Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40 www.csti.ch -----Original Message----- From: Eugen Block <eblock@xxxxxx> Sent: mercredi, 6 mars 2024 10:47 To: ceph-users@xxxxxxx Subject: Re: Upgarde from 16.2.1 to 16.2.2 pacific stuck There was another issue when having more than two MGRs, maybe you're hitting that (https://tracker.ceph.com/issues/57675, https://github.com/ceph/ceph/pull/48258). I believe my workaround was to set the global config to a newer image (target version) and then deployed a new mgr. Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>: > The process has now started but I have the following error on mgr to > the second node > > > > root@rke-sh1-1:~# ceph orch ps > > NAME HOST PORTS STATUS > REFRESHED AGE VERSION IMAGE ID CONTAINER ID > > crash.rke-sh1-1 rke-sh1-1 running (12d) 41s ago > 12d 16.2.1 c757e4a3636b e8652edb2b49 > > crash.rke-sh1-2 rke-sh1-2 running (12d) 2s ago > 20M 16.2.1 c757e4a3636b a1249a605ee0 > > crash.rke-sh1-3 rke-sh1-3 running (12d) 41s ago > 12d 16.2.1 c757e4a3636b 026667bc1776 > > mds.cephfs.rke-sh1-1.ojmpnk rke-sh1-1 running (12d) 41s ago > 5M 16.2.1 c757e4a3636b 9b4c2b08b759 > > mds.cephfs.rke-sh1-2.isqjza rke-sh1-2 running (12d) 2s ago > 23M 16.2.1 c757e4a3636b 71681a5f34d3 > > mds.cephfs.rke-sh1-3.vdicdn rke-sh1-3 running (12d) 41s ago > 4M 16.2.1 c757e4a3636b e89946ad6b7e > > mgr.rke-sh1-1.qskoyj rke-sh1-1 *:8082,9283 running (66m) 41s ago > 2y 16.2.2 5e237c38caa6 123cabbc2994 > > mgr.rke-sh1-2.lxmguj rke-sh1-2 *:8082,9283 running (6s) 2s ago > 22M 16.2.2 5e237c38caa6 b2a9047be1d6 > > mgr.rke-sh1-3.ckunvo rke-sh1-3 *:8082,9283 running (12d) 41s ago > 7M 16.2.1 c757e4a3636b 2fcaf18f3218 > > mon.rke-sh1-1 rke-sh1-1 running (37m) 41s ago > 37m 16.2.1 c757e4a3636b 84e63e0415a8 > > mon.rke-sh1-2 rke-sh1-2 running (12d) 2s ago > 4M 16.2.1 c757e4a3636b f4b32ba4466b > > mon.rke-sh1-3 rke-sh1-3 running (12d) 41s ago > 12d 16.2.1 c757e4a3636b d5e44c245998 > > osd.0 rke-sh1-2 running (12d) 2s ago > 3y 16.2.1 c757e4a3636b 7b0e69942c15 > > osd.1 rke-sh1-3 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b 4451654d9a2d > > osd.10 rke-sh1-3 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b 3f9d5f95e284 > > osd.11 rke-sh1-1 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b db1cc6d2e37f > > osd.12 rke-sh1-2 running (12d) 2s ago > 3y 16.2.1 c757e4a3636b de416c1ef766 > > osd.13 rke-sh1-3 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b 25a281cc5a9b > > osd.14 rke-sh1-1 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b 62f25ba61667 > > osd.15 rke-sh1-2 running (12d) 2s ago > 3y 16.2.1 c757e4a3636b d3514d823c45 > > osd.16 rke-sh1-3 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b bba857759bfe > > osd.17 rke-sh1-1 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b 59281d4bb3d0 > > osd.2 rke-sh1-1 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b 418041b5e60d > > osd.3 rke-sh1-2 running (12d) 2s ago > 3y 16.2.1 c757e4a3636b 04a0e29d5623 > > osd.4 rke-sh1-1 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b 1cc78a5153d3 > > osd.5 rke-sh1-3 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b 39a4b11e31fb > > osd.6 rke-sh1-2 running (12d) 2s ago > 3y 16.2.1 c757e4a3636b 2f218ffb566e > > osd.7 rke-sh1-1 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b cf761fbe4d5f > > osd.8 rke-sh1-3 running (12d) 41s ago > 3y 16.2.1 c757e4a3636b f9f85480e800 > > osd.9 rke-sh1-2 running (12d) 2s ago > 3y 16.2.1 c757e4a3636b 664c54ff46d2 > > rgw.default.rke-sh1-1.dgucwl rke-sh1-1 *:8000 running (12d) 41s ago > 22M 16.2.1 c757e4a3636b f03212b955a7 > > rgw.default.rke-sh1-1.vylchc rke-sh1-1 *:8001 running (12d) 41s ago > 22M 16.2.1 c757e4a3636b da486ce43fe5 > > rgw.default.rke-sh1-2.dfhhfw rke-sh1-2 *:8000 running (12d) 2s ago > 2y 16.2.1 c757e4a3636b ef4089d0aef2 > > rgw.default.rke-sh1-2.efkbum rke-sh1-2 *:8001 running (12d) 2s ago > 2y 16.2.1 c757e4a3636b 9e053d5a2f7b > > rgw.default.rke-sh1-3.krfgey rke-sh1-3 *:8001 running (12d) 41s ago > 9M 16.2.1 c757e4a3636b 45cd3d75edd3 > > rgw.default.rke-sh1-3.pwdbmp rke-sh1-3 *:8000 running (12d) 41s ago > 9M 16.2.1 c757e4a3636b e2710265a7f4 > > > > #tail -f > /var/log/ceph/fcb373ce-7aaa-11eb-984f-e7c6e0038e87/ceph-mgr.rke-sh1-2. > lxmguj > .log > > 2024-03-06T09:24:42.468+0000 7fe68b500700 0 [dashboard DEBUG root] > setting log level: INFO > > 2024-03-06T09:24:42.468+0000 7fe68b500700 1 mgr load Constructed > class from > module: dashboard > > 2024-03-06T09:24:42.468+0000 7fe68acff700 0 ms_deliver_dispatch: > unhandled message 0x55f722292160 mon_map magic: 0 v1 from mon.0 > v2:10.10.71.2:3300/0 > > 2024-03-06T09:24:42.468+0000 7fe68b500700 0 [prometheus DEBUG root] > setting log level based on debug_mgr: WARNING (1/5) > > 2024-03-06T09:24:42.468+0000 7fe68b500700 1 mgr load Constructed > class from > module: prometheus > > 2024-03-06T09:24:42.468+0000 7fe64110d700 0 [dashboard INFO root] server: > ssl=no host=:: port=8082 > > 2024-03-06T09:24:42.472+0000 7fe64110d700 0 [dashboard INFO root] > Configured CherryPy, starting engine... > > 2024-03-06T09:24:42.472+0000 7fe64110d700 0 [dashboard INFO root] > Starting engine... > > 2024-03-06T09:24:42.580+0000 7fe64110d700 0 [dashboard INFO root] > Engine started... > > 2024-03-06T09:24:44.020+0000 7f0085fb8500 0 set uid:gid to 167:167 > (ceph:ceph) > > 2024-03-06T09:24:44.020+0000 7f0085fb8500 0 ceph version 16.2.2 > (e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable), process > ceph-mgr, pid 7 > > 2024-03-06T09:24:44.020+0000 7f0085fb8500 0 pidfile_write: ignore > empty --pid-file > > 2024-03-06T09:24:44.044+0000 7f0085fb8500 1 mgr[py] Loading python > module 'alerts' > > 2024-03-06T09:24:44.156+0000 7f0085fb8500 1 mgr[py] Loading python > module 'balancer' > > 2024-03-06T09:24:44.240+0000 7f0085fb8500 1 mgr[py] Loading python > module 'cephadm' > > 2024-03-06T09:24:44.484+0000 7f0085fb8500 1 mgr[py] Loading python > module 'crash' > > 2024-03-06T09:24:44.568+0000 7f0085fb8500 1 mgr[py] Loading python > module 'dashboard' > > 2024-03-06T09:24:45.100+0000 7f0085fb8500 1 mgr[py] Loading python > module 'devicehealth' > > 2024-03-06T09:24:45.184+0000 7f0085fb8500 1 mgr[py] Loading python > module 'diskprediction_local' > > 2024-03-06T09:24:45.396+0000 7f0085fb8500 1 mgr[py] Loading python > module 'influx' > > 2024-03-06T09:24:45.488+0000 7f0085fb8500 1 mgr[py] Loading python > module 'insights' > > 2024-03-06T09:24:45.572+0000 7f0085fb8500 1 mgr[py] Loading python > module 'iostat' > > 2024-03-06T09:24:45.724+0000 7f0085fb8500 1 mgr[py] Loading python > module 'k8sevents' > > 2024-03-06T09:24:46.172+0000 7f0085fb8500 1 mgr[py] Loading python > module 'localpool' > > 2024-03-06T09:24:46.260+0000 7f0085fb8500 1 mgr[py] Loading python > module 'mds_autoscaler' > > 2024-03-06T09:24:46.416+0000 7f0085fb8500 1 mgr[py] Loading python > module 'mirroring' > > 2024-03-06T09:24:46.528+0000 7f0085fb8500 1 mgr[py] Loading python > module 'orchestrator' > > 2024-03-06T09:24:46.776+0000 7f0085fb8500 1 mgr[py] Loading python > module 'osd_support' > > 2024-03-06T09:24:46.860+0000 7f0085fb8500 1 mgr[py] Loading python > module 'pg_autoscaler' > > 2024-03-06T09:24:46.956+0000 7f0085fb8500 1 mgr[py] Loading python > module 'progress' > > 2024-03-06T09:24:47.052+0000 7f0085fb8500 1 mgr[py] Loading python > module 'prometheus' > > 2024-03-06T09:24:47.524+0000 7f0085fb8500 1 mgr[py] Loading python > module 'rbd_support' > > 2024-03-06T09:24:47.640+0000 7f0085fb8500 1 mgr[py] Loading python > module 'restful' > > 2024-03-06T09:24:47.924+0000 7f0085fb8500 1 mgr[py] Loading python > module 'rook' > > 2024-03-06T09:24:48.536+0000 7f0085fb8500 1 mgr[py] Loading python > module 'selftest' > > 2024-03-06T09:24:48.640+0000 7f0085fb8500 1 mgr[py] Loading python > module 'snap_schedule' > > 2024-03-06T09:24:48.776+0000 7f0085fb8500 1 mgr[py] Loading python > module 'stats' > > 2024-03-06T09:24:48.876+0000 7f0085fb8500 1 mgr[py] Loading python > module 'status' > > 2024-03-06T09:24:48.984+0000 7f0085fb8500 1 mgr[py] Loading python > module 'telegraf' > > 2024-03-06T09:24:49.088+0000 7f0085fb8500 1 mgr[py] Loading python > module 'telemetry' > > 2024-03-06T09:24:49.248+0000 7f0085fb8500 1 mgr[py] Loading python > module 'test_orchestrator' > > 2024-03-06T09:24:49.632+0000 7f0085fb8500 1 mgr[py] Loading python > module 'volumes' > > 2024-03-06T09:24:49.832+0000 7f0085fb8500 1 mgr[py] Loading python > module 'zabbix' > > 2024-03-06T09:24:49.936+0000 7f00739df700 0 [dashboard DEBUG root] > setting log level: INFO > > 2024-03-06T09:24:49.936+0000 7f00739df700 1 mgr load Constructed > class from > module: dashboard > > 2024-03-06T09:24:49.936+0000 7f00731de700 0 ms_deliver_dispatch: > unhandled message 0x556eb3224160 mon_map magic: 0 v1 from mon.2 > v2:10.10.71.1:3300/0 > > 2024-03-06T09:24:49.936+0000 7f00739df700 0 [prometheus DEBUG root] > setting log level based on debug_mgr: WARNING (1/5) > > 2024-03-06T09:24:49.936+0000 7f00739df700 1 mgr load Constructed > class from > module: prometheus > > 2024-03-06T09:24:49.936+0000 7f00235e9700 0 [dashboard INFO root] server: > ssl=no host=:: port=8082 > > 2024-03-06T09:24:49.940+0000 7f00235e9700 0 [dashboard INFO root] > Configured CherryPy, starting engine... > > 2024-03-06T09:24:49.940+0000 7f00235e9700 0 [dashboard INFO root] > Starting engine... > > 2024-03-06T09:24:50.048+0000 7f00235e9700 0 [dashboard INFO root] > Engine started... > > 2024-03-06T09:24:51.584+0000 7f0843ec9500 0 set uid:gid to 167:167 > (ceph:ceph) > > 2024-03-06T09:24:51.584+0000 7f0843ec9500 0 ceph version 16.2.2 > (e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable), process > ceph-mgr, pid 7 > > 2024-03-06T09:24:51.584+0000 7f0843ec9500 0 pidfile_write: ignore > empty --pid-file > > > > # cephadm logs --fsid fcb373ce-7aaa-11eb-984f-e7c6e0038e87 --name > mgr.rke-sh1-2.lxmguj > > Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE > Bus STARTING > > Mar 06 09:27:18 rke-sh1-2 bash[623306]: CherryPy Checker: > > Mar 06 09:27:18 rke-sh1-2 bash[623306]: The Application mounted at '' > has an empty config. > > Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE > Serving on http://:::9283 > > Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE > Bus STARTED > > Mar 06 09:27:18 rke-sh1-2 systemd[1]: Stopping Ceph > mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87... > > Mar 06 09:27:18 rke-sh1-2 docker[624494]: > ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87-mgr.rke-sh1-2.lxmguj > > Mar 06 09:27:18 rke-sh1-2 systemd[1]: > ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87@xxxxxxx-sh1-2.lxmguj.service > : Main process exited, code=exited, status=143/n/a > > Mar 06 09:27:18 rke-sh1-2 systemd[1]: > ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87@xxxxxxx-sh1-2.lxmguj.service: > Failed with result 'exit-code'. > > Mar 06 09:27:18 rke-sh1-2 systemd[1]: Stopped Ceph > mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87. > > Mar 06 09:27:19 rke-sh1-2 systemd[1]: Started Ceph > mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87. > > > > The mgr.rke-sh1-2.lxmguj daemon is crashlooping. > > > > Do you have an idea on what going on ? > > > > Issue with the dashboard module ? > > > > Bets Regards, > > > > Edouard FAZENDA > > Technical Support > > > > > > > > Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40 > > > > <https://www.csti.ch/> www.csti.ch > > > > From: Edouard FAZENDA > Sent: mercredi, 6 mars 2024 09:42 > To: ceph-users@xxxxxxx > Subject: Upgarde from 16.2.1 to 16.2.2 pacific stuck > > > > Dear Ceph Community, > > > > I am in the process of upgrading ceph pacific 16.2.1 to 16.2.2 , I > have followed the documentation : > https://docs.ceph.com/en/pacific/cephadm/upgrade/ > > > > My cluster is in Healthy state , but the upgrade is not going forward > , as on the cephadm logs I have the following : > > > > # Ceph -W cephadm > > 2024-03-06T08:39:11.653447+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: > Need to upgrade myself (mgr.rke-sh1-1.qskoyj) > > 2024-03-06T08:39:12.281386+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: > Updating mgr.rke-sh1-2.lxmguj > > 2024-03-06T08:39:12.286096+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying > daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2 > > 2024-03-06T08:39:19.347877+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered > out host > rke-sh1-1: could not verify host allowed virtual ips > > 2024-03-06T08:39:19.347989+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered > out host > rke-sh1-3: could not verify host allowed virtual ips > > 2024-03-06T08:39:19.366355+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: > Need to upgrade myself (mgr.rke-sh1-1.qskoyj) > > 2024-03-06T08:39:19.965822+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: > Updating mgr.rke-sh1-2.lxmguj > > 2024-03-06T08:39:19.969089+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying > daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2 > > 2024-03-06T08:39:26.961455+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered > out host > rke-sh1-1: could not verify host allowed virtual ips > > 2024-03-06T08:39:26.961502+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered > out host > rke-sh1-3: could not verify host allowed virtual ips > > 2024-03-06T08:39:26.973897+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: > Need to upgrade myself (mgr.rke-sh1-1.qskoyj) > > 2024-03-06T08:39:27.623773+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: > Updating mgr.rke-sh1-2.lxmguj > > 2024-03-06T08:39:27.628115+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying > daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2 > > > > My public_network is set : > > > > root@rke-sh1-1:~# ceph config dump | grep public_network > > mon advanced public_network > 10.10.71.0/24 > > * > > Do you have an idea why I have the following error : > > > > Filtered out host: could not verify host allowed virtual ips > > > > > > Current state of the upgrade : > > > > # ceph orch upgrade status > > { > > "target_image": > "docker.io/ceph/ceph@sha256:8cdd8c7dfc7be5865255f0d59c048a1fb8d1335f69 > 23996e > 2c2d9439499b5cf2 > <mailto:docker.io/ceph/ceph@sha256:8cdd8c7dfc7be5865255f0d59c048a1fb8d > 1335f6 > 923996e2c2d9439499b5cf2> ", > > "in_progress": true, > > "services_complete": [], > > "progress": "0/35 ceph daemons upgraded", > > "message": "Currently upgrading mgr daemons" > > } > > > > progress: > > Upgrade to 16.2.2 (24m) > > [............................] > > > > Thanks for the help. > > > > Best Regards, > > > > Edouard FAZENDA > > Technical Support > > > > > > > > Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40 > > > > <https://www.csti.ch/> www.csti.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx