Okay, so the first thing I would do is to stop the upgrade. Then make
sure that you have two running MGRs with the current version of the
rest of the cluster (.1). If no other daemons have been upgraded it
shouldn't be a big issue. If necessary you can modify the unit.run
file and specify there the container image for the MGRs. If they both
start successfully try an upgrade to 16.2.15 (was just released this
week) instead of 16.2.2.
Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>:
Dear Eugen,
I have removed one mgr on the node 3 , the second one is still
crashlooping and on node 1 mgr is in 16.2.2
Not sure to understand your workaround.
* Stopping current upgrade to rollback if possible and afterward
upgrading to latest release of pacific ?
Best Regards,
Edouard FAZENDA
Technical Support
Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40
www.csti.ch
-----Original Message-----
From: Eugen Block <eblock@xxxxxx>
Sent: mercredi, 6 mars 2024 10:47
To: ceph-users@xxxxxxx
Subject: Re: Upgarde from 16.2.1 to 16.2.2 pacific stuck
There was another issue when having more than two MGRs, maybe you're
hitting that (https://tracker.ceph.com/issues/57675,
https://github.com/ceph/ceph/pull/48258). I believe my workaround
was to set the global config to a newer image (target version) and
then deployed a new mgr.
Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>:
The process has now started but I have the following error on mgr to
the second node
root@rke-sh1-1:~# ceph orch ps
NAME HOST PORTS STATUS
REFRESHED AGE VERSION IMAGE ID CONTAINER ID
crash.rke-sh1-1 rke-sh1-1 running (12d) 41s ago
12d 16.2.1 c757e4a3636b e8652edb2b49
crash.rke-sh1-2 rke-sh1-2 running (12d) 2s ago
20M 16.2.1 c757e4a3636b a1249a605ee0
crash.rke-sh1-3 rke-sh1-3 running (12d) 41s ago
12d 16.2.1 c757e4a3636b 026667bc1776
mds.cephfs.rke-sh1-1.ojmpnk rke-sh1-1 running (12d) 41s ago
5M 16.2.1 c757e4a3636b 9b4c2b08b759
mds.cephfs.rke-sh1-2.isqjza rke-sh1-2 running (12d) 2s ago
23M 16.2.1 c757e4a3636b 71681a5f34d3
mds.cephfs.rke-sh1-3.vdicdn rke-sh1-3 running (12d) 41s ago
4M 16.2.1 c757e4a3636b e89946ad6b7e
mgr.rke-sh1-1.qskoyj rke-sh1-1 *:8082,9283 running (66m) 41s ago
2y 16.2.2 5e237c38caa6 123cabbc2994
mgr.rke-sh1-2.lxmguj rke-sh1-2 *:8082,9283 running (6s) 2s ago
22M 16.2.2 5e237c38caa6 b2a9047be1d6
mgr.rke-sh1-3.ckunvo rke-sh1-3 *:8082,9283 running (12d) 41s ago
7M 16.2.1 c757e4a3636b 2fcaf18f3218
mon.rke-sh1-1 rke-sh1-1 running (37m) 41s ago
37m 16.2.1 c757e4a3636b 84e63e0415a8
mon.rke-sh1-2 rke-sh1-2 running (12d) 2s ago
4M 16.2.1 c757e4a3636b f4b32ba4466b
mon.rke-sh1-3 rke-sh1-3 running (12d) 41s ago
12d 16.2.1 c757e4a3636b d5e44c245998
osd.0 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b 7b0e69942c15
osd.1 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 4451654d9a2d
osd.10 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 3f9d5f95e284
osd.11 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b db1cc6d2e37f
osd.12 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b de416c1ef766
osd.13 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 25a281cc5a9b
osd.14 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 62f25ba61667
osd.15 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b d3514d823c45
osd.16 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b bba857759bfe
osd.17 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 59281d4bb3d0
osd.2 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 418041b5e60d
osd.3 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b 04a0e29d5623
osd.4 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 1cc78a5153d3
osd.5 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 39a4b11e31fb
osd.6 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b 2f218ffb566e
osd.7 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b cf761fbe4d5f
osd.8 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b f9f85480e800
osd.9 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b 664c54ff46d2
rgw.default.rke-sh1-1.dgucwl rke-sh1-1 *:8000 running (12d) 41s ago
22M 16.2.1 c757e4a3636b f03212b955a7
rgw.default.rke-sh1-1.vylchc rke-sh1-1 *:8001 running (12d) 41s ago
22M 16.2.1 c757e4a3636b da486ce43fe5
rgw.default.rke-sh1-2.dfhhfw rke-sh1-2 *:8000 running (12d) 2s ago
2y 16.2.1 c757e4a3636b ef4089d0aef2
rgw.default.rke-sh1-2.efkbum rke-sh1-2 *:8001 running (12d) 2s ago
2y 16.2.1 c757e4a3636b 9e053d5a2f7b
rgw.default.rke-sh1-3.krfgey rke-sh1-3 *:8001 running (12d) 41s ago
9M 16.2.1 c757e4a3636b 45cd3d75edd3
rgw.default.rke-sh1-3.pwdbmp rke-sh1-3 *:8000 running (12d) 41s ago
9M 16.2.1 c757e4a3636b e2710265a7f4
#tail -f
/var/log/ceph/fcb373ce-7aaa-11eb-984f-e7c6e0038e87/ceph-mgr.rke-sh1-2.
lxmguj
.log
2024-03-06T09:24:42.468+0000 7fe68b500700 0 [dashboard DEBUG root]
setting log level: INFO
2024-03-06T09:24:42.468+0000 7fe68b500700 1 mgr load Constructed
class from
module: dashboard
2024-03-06T09:24:42.468+0000 7fe68acff700 0 ms_deliver_dispatch:
unhandled message 0x55f722292160 mon_map magic: 0 v1 from mon.0
v2:10.10.71.2:3300/0
2024-03-06T09:24:42.468+0000 7fe68b500700 0 [prometheus DEBUG root]
setting log level based on debug_mgr: WARNING (1/5)
2024-03-06T09:24:42.468+0000 7fe68b500700 1 mgr load Constructed
class from
module: prometheus
2024-03-06T09:24:42.468+0000 7fe64110d700 0 [dashboard INFO root] server:
ssl=no host=:: port=8082
2024-03-06T09:24:42.472+0000 7fe64110d700 0 [dashboard INFO root]
Configured CherryPy, starting engine...
2024-03-06T09:24:42.472+0000 7fe64110d700 0 [dashboard INFO root]
Starting engine...
2024-03-06T09:24:42.580+0000 7fe64110d700 0 [dashboard INFO root]
Engine started...
2024-03-06T09:24:44.020+0000 7f0085fb8500 0 set uid:gid to 167:167
(ceph:ceph)
2024-03-06T09:24:44.020+0000 7f0085fb8500 0 ceph version 16.2.2
(e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable), process
ceph-mgr, pid 7
2024-03-06T09:24:44.020+0000 7f0085fb8500 0 pidfile_write: ignore
empty --pid-file
2024-03-06T09:24:44.044+0000 7f0085fb8500 1 mgr[py] Loading python
module 'alerts'
2024-03-06T09:24:44.156+0000 7f0085fb8500 1 mgr[py] Loading python
module 'balancer'
2024-03-06T09:24:44.240+0000 7f0085fb8500 1 mgr[py] Loading python
module 'cephadm'
2024-03-06T09:24:44.484+0000 7f0085fb8500 1 mgr[py] Loading python
module 'crash'
2024-03-06T09:24:44.568+0000 7f0085fb8500 1 mgr[py] Loading python
module 'dashboard'
2024-03-06T09:24:45.100+0000 7f0085fb8500 1 mgr[py] Loading python
module 'devicehealth'
2024-03-06T09:24:45.184+0000 7f0085fb8500 1 mgr[py] Loading python
module 'diskprediction_local'
2024-03-06T09:24:45.396+0000 7f0085fb8500 1 mgr[py] Loading python
module 'influx'
2024-03-06T09:24:45.488+0000 7f0085fb8500 1 mgr[py] Loading python
module 'insights'
2024-03-06T09:24:45.572+0000 7f0085fb8500 1 mgr[py] Loading python
module 'iostat'
2024-03-06T09:24:45.724+0000 7f0085fb8500 1 mgr[py] Loading python
module 'k8sevents'
2024-03-06T09:24:46.172+0000 7f0085fb8500 1 mgr[py] Loading python
module 'localpool'
2024-03-06T09:24:46.260+0000 7f0085fb8500 1 mgr[py] Loading python
module 'mds_autoscaler'
2024-03-06T09:24:46.416+0000 7f0085fb8500 1 mgr[py] Loading python
module 'mirroring'
2024-03-06T09:24:46.528+0000 7f0085fb8500 1 mgr[py] Loading python
module 'orchestrator'
2024-03-06T09:24:46.776+0000 7f0085fb8500 1 mgr[py] Loading python
module 'osd_support'
2024-03-06T09:24:46.860+0000 7f0085fb8500 1 mgr[py] Loading python
module 'pg_autoscaler'
2024-03-06T09:24:46.956+0000 7f0085fb8500 1 mgr[py] Loading python
module 'progress'
2024-03-06T09:24:47.052+0000 7f0085fb8500 1 mgr[py] Loading python
module 'prometheus'
2024-03-06T09:24:47.524+0000 7f0085fb8500 1 mgr[py] Loading python
module 'rbd_support'
2024-03-06T09:24:47.640+0000 7f0085fb8500 1 mgr[py] Loading python
module 'restful'
2024-03-06T09:24:47.924+0000 7f0085fb8500 1 mgr[py] Loading python
module 'rook'
2024-03-06T09:24:48.536+0000 7f0085fb8500 1 mgr[py] Loading python
module 'selftest'
2024-03-06T09:24:48.640+0000 7f0085fb8500 1 mgr[py] Loading python
module 'snap_schedule'
2024-03-06T09:24:48.776+0000 7f0085fb8500 1 mgr[py] Loading python
module 'stats'
2024-03-06T09:24:48.876+0000 7f0085fb8500 1 mgr[py] Loading python
module 'status'
2024-03-06T09:24:48.984+0000 7f0085fb8500 1 mgr[py] Loading python
module 'telegraf'
2024-03-06T09:24:49.088+0000 7f0085fb8500 1 mgr[py] Loading python
module 'telemetry'
2024-03-06T09:24:49.248+0000 7f0085fb8500 1 mgr[py] Loading python
module 'test_orchestrator'
2024-03-06T09:24:49.632+0000 7f0085fb8500 1 mgr[py] Loading python
module 'volumes'
2024-03-06T09:24:49.832+0000 7f0085fb8500 1 mgr[py] Loading python
module 'zabbix'
2024-03-06T09:24:49.936+0000 7f00739df700 0 [dashboard DEBUG root]
setting log level: INFO
2024-03-06T09:24:49.936+0000 7f00739df700 1 mgr load Constructed
class from
module: dashboard
2024-03-06T09:24:49.936+0000 7f00731de700 0 ms_deliver_dispatch:
unhandled message 0x556eb3224160 mon_map magic: 0 v1 from mon.2
v2:10.10.71.1:3300/0
2024-03-06T09:24:49.936+0000 7f00739df700 0 [prometheus DEBUG root]
setting log level based on debug_mgr: WARNING (1/5)
2024-03-06T09:24:49.936+0000 7f00739df700 1 mgr load Constructed
class from
module: prometheus
2024-03-06T09:24:49.936+0000 7f00235e9700 0 [dashboard INFO root] server:
ssl=no host=:: port=8082
2024-03-06T09:24:49.940+0000 7f00235e9700 0 [dashboard INFO root]
Configured CherryPy, starting engine...
2024-03-06T09:24:49.940+0000 7f00235e9700 0 [dashboard INFO root]
Starting engine...
2024-03-06T09:24:50.048+0000 7f00235e9700 0 [dashboard INFO root]
Engine started...
2024-03-06T09:24:51.584+0000 7f0843ec9500 0 set uid:gid to 167:167
(ceph:ceph)
2024-03-06T09:24:51.584+0000 7f0843ec9500 0 ceph version 16.2.2
(e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable), process
ceph-mgr, pid 7
2024-03-06T09:24:51.584+0000 7f0843ec9500 0 pidfile_write: ignore
empty --pid-file
# cephadm logs --fsid fcb373ce-7aaa-11eb-984f-e7c6e0038e87 --name
mgr.rke-sh1-2.lxmguj
Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE
Bus STARTING
Mar 06 09:27:18 rke-sh1-2 bash[623306]: CherryPy Checker:
Mar 06 09:27:18 rke-sh1-2 bash[623306]: The Application mounted at ''
has an empty config.
Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE
Serving on http://:::9283
Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE
Bus STARTED
Mar 06 09:27:18 rke-sh1-2 systemd[1]: Stopping Ceph
mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87...
Mar 06 09:27:18 rke-sh1-2 docker[624494]:
ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87-mgr.rke-sh1-2.lxmguj
Mar 06 09:27:18 rke-sh1-2 systemd[1]:
ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87@xxxxxxx-sh1-2.lxmguj.service
: Main process exited, code=exited, status=143/n/a
Mar 06 09:27:18 rke-sh1-2 systemd[1]:
ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87@xxxxxxx-sh1-2.lxmguj.service:
Failed with result 'exit-code'.
Mar 06 09:27:18 rke-sh1-2 systemd[1]: Stopped Ceph
mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87.
Mar 06 09:27:19 rke-sh1-2 systemd[1]: Started Ceph
mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87.
The mgr.rke-sh1-2.lxmguj daemon is crashlooping.
Do you have an idea on what going on ?
Issue with the dashboard module ?
Bets Regards,
Edouard FAZENDA
Technical Support
Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40
<https://www.csti.ch/> www.csti.ch
From: Edouard FAZENDA
Sent: mercredi, 6 mars 2024 09:42
To: ceph-users@xxxxxxx
Subject: Upgarde from 16.2.1 to 16.2.2 pacific stuck
Dear Ceph Community,
I am in the process of upgrading ceph pacific 16.2.1 to 16.2.2 , I
have followed the documentation :
https://docs.ceph.com/en/pacific/cephadm/upgrade/
My cluster is in Healthy state , but the upgrade is not going forward
, as on the cephadm logs I have the following :
# Ceph -W cephadm
2024-03-06T08:39:11.653447+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Need to upgrade myself (mgr.rke-sh1-1.qskoyj)
2024-03-06T08:39:12.281386+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Updating mgr.rke-sh1-2.lxmguj
2024-03-06T08:39:12.286096+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying
daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2
2024-03-06T08:39:19.347877+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered
out host
rke-sh1-1: could not verify host allowed virtual ips
2024-03-06T08:39:19.347989+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered
out host
rke-sh1-3: could not verify host allowed virtual ips
2024-03-06T08:39:19.366355+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Need to upgrade myself (mgr.rke-sh1-1.qskoyj)
2024-03-06T08:39:19.965822+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Updating mgr.rke-sh1-2.lxmguj
2024-03-06T08:39:19.969089+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying
daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2
2024-03-06T08:39:26.961455+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered
out host
rke-sh1-1: could not verify host allowed virtual ips
2024-03-06T08:39:26.961502+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered
out host
rke-sh1-3: could not verify host allowed virtual ips
2024-03-06T08:39:26.973897+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Need to upgrade myself (mgr.rke-sh1-1.qskoyj)
2024-03-06T08:39:27.623773+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Updating mgr.rke-sh1-2.lxmguj
2024-03-06T08:39:27.628115+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying
daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2
My public_network is set :
root@rke-sh1-1:~# ceph config dump | grep public_network
mon advanced public_network
10.10.71.0/24
*
Do you have an idea why I have the following error :
Filtered out host: could not verify host allowed virtual ips
Current state of the upgrade :
# ceph orch upgrade status
{
"target_image":
"docker.io/ceph/ceph@sha256:8cdd8c7dfc7be5865255f0d59c048a1fb8d1335f69
23996e
2c2d9439499b5cf2
<mailto:docker.io/ceph/ceph@sha256:8cdd8c7dfc7be5865255f0d59c048a1fb8d
1335f6
923996e2c2d9439499b5cf2> ",
"in_progress": true,
"services_complete": [],
"progress": "0/35 ceph daemons upgraded",
"message": "Currently upgrading mgr daemons"
}
progress:
Upgrade to 16.2.2 (24m)
[............................]
Thanks for the help.
Best Regards,
Edouard FAZENDA
Technical Support
Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40
<https://www.csti.ch/> www.csti.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx