There was another issue when having more than two MGRs, maybe you're
hitting that (https://tracker.ceph.com/issues/57675,
https://github.com/ceph/ceph/pull/48258). I believe my workaround was
to set the global config to a newer image (target version) and then
deployed a new mgr.
Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>:
The process has now started but I have the following error on mgr to the
second node
root@rke-sh1-1:~# ceph orch ps
NAME HOST PORTS STATUS
REFRESHED AGE VERSION IMAGE ID CONTAINER ID
crash.rke-sh1-1 rke-sh1-1 running (12d) 41s ago
12d 16.2.1 c757e4a3636b e8652edb2b49
crash.rke-sh1-2 rke-sh1-2 running (12d) 2s ago
20M 16.2.1 c757e4a3636b a1249a605ee0
crash.rke-sh1-3 rke-sh1-3 running (12d) 41s ago
12d 16.2.1 c757e4a3636b 026667bc1776
mds.cephfs.rke-sh1-1.ojmpnk rke-sh1-1 running (12d) 41s ago
5M 16.2.1 c757e4a3636b 9b4c2b08b759
mds.cephfs.rke-sh1-2.isqjza rke-sh1-2 running (12d) 2s ago
23M 16.2.1 c757e4a3636b 71681a5f34d3
mds.cephfs.rke-sh1-3.vdicdn rke-sh1-3 running (12d) 41s ago
4M 16.2.1 c757e4a3636b e89946ad6b7e
mgr.rke-sh1-1.qskoyj rke-sh1-1 *:8082,9283 running (66m) 41s ago
2y 16.2.2 5e237c38caa6 123cabbc2994
mgr.rke-sh1-2.lxmguj rke-sh1-2 *:8082,9283 running (6s) 2s ago
22M 16.2.2 5e237c38caa6 b2a9047be1d6
mgr.rke-sh1-3.ckunvo rke-sh1-3 *:8082,9283 running (12d) 41s ago
7M 16.2.1 c757e4a3636b 2fcaf18f3218
mon.rke-sh1-1 rke-sh1-1 running (37m) 41s ago
37m 16.2.1 c757e4a3636b 84e63e0415a8
mon.rke-sh1-2 rke-sh1-2 running (12d) 2s ago
4M 16.2.1 c757e4a3636b f4b32ba4466b
mon.rke-sh1-3 rke-sh1-3 running (12d) 41s ago
12d 16.2.1 c757e4a3636b d5e44c245998
osd.0 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b 7b0e69942c15
osd.1 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 4451654d9a2d
osd.10 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 3f9d5f95e284
osd.11 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b db1cc6d2e37f
osd.12 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b de416c1ef766
osd.13 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 25a281cc5a9b
osd.14 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 62f25ba61667
osd.15 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b d3514d823c45
osd.16 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b bba857759bfe
osd.17 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 59281d4bb3d0
osd.2 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 418041b5e60d
osd.3 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b 04a0e29d5623
osd.4 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 1cc78a5153d3
osd.5 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b 39a4b11e31fb
osd.6 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b 2f218ffb566e
osd.7 rke-sh1-1 running (12d) 41s ago
3y 16.2.1 c757e4a3636b cf761fbe4d5f
osd.8 rke-sh1-3 running (12d) 41s ago
3y 16.2.1 c757e4a3636b f9f85480e800
osd.9 rke-sh1-2 running (12d) 2s ago
3y 16.2.1 c757e4a3636b 664c54ff46d2
rgw.default.rke-sh1-1.dgucwl rke-sh1-1 *:8000 running (12d) 41s ago
22M 16.2.1 c757e4a3636b f03212b955a7
rgw.default.rke-sh1-1.vylchc rke-sh1-1 *:8001 running (12d) 41s ago
22M 16.2.1 c757e4a3636b da486ce43fe5
rgw.default.rke-sh1-2.dfhhfw rke-sh1-2 *:8000 running (12d) 2s ago
2y 16.2.1 c757e4a3636b ef4089d0aef2
rgw.default.rke-sh1-2.efkbum rke-sh1-2 *:8001 running (12d) 2s ago
2y 16.2.1 c757e4a3636b 9e053d5a2f7b
rgw.default.rke-sh1-3.krfgey rke-sh1-3 *:8001 running (12d) 41s ago
9M 16.2.1 c757e4a3636b 45cd3d75edd3
rgw.default.rke-sh1-3.pwdbmp rke-sh1-3 *:8000 running (12d) 41s ago
9M 16.2.1 c757e4a3636b e2710265a7f4
#tail -f
/var/log/ceph/fcb373ce-7aaa-11eb-984f-e7c6e0038e87/ceph-mgr.rke-sh1-2.lxmguj
.log
2024-03-06T09:24:42.468+0000 7fe68b500700 0 [dashboard DEBUG root] setting
log level: INFO
2024-03-06T09:24:42.468+0000 7fe68b500700 1 mgr load Constructed class from
module: dashboard
2024-03-06T09:24:42.468+0000 7fe68acff700 0 ms_deliver_dispatch: unhandled
message 0x55f722292160 mon_map magic: 0 v1 from mon.0 v2:10.10.71.2:3300/0
2024-03-06T09:24:42.468+0000 7fe68b500700 0 [prometheus DEBUG root] setting
log level based on debug_mgr: WARNING (1/5)
2024-03-06T09:24:42.468+0000 7fe68b500700 1 mgr load Constructed class from
module: prometheus
2024-03-06T09:24:42.468+0000 7fe64110d700 0 [dashboard INFO root] server:
ssl=no host=:: port=8082
2024-03-06T09:24:42.472+0000 7fe64110d700 0 [dashboard INFO root]
Configured CherryPy, starting engine...
2024-03-06T09:24:42.472+0000 7fe64110d700 0 [dashboard INFO root] Starting
engine...
2024-03-06T09:24:42.580+0000 7fe64110d700 0 [dashboard INFO root] Engine
started...
2024-03-06T09:24:44.020+0000 7f0085fb8500 0 set uid:gid to 167:167
(ceph:ceph)
2024-03-06T09:24:44.020+0000 7f0085fb8500 0 ceph version 16.2.2
(e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable), process
ceph-mgr, pid 7
2024-03-06T09:24:44.020+0000 7f0085fb8500 0 pidfile_write: ignore empty
--pid-file
2024-03-06T09:24:44.044+0000 7f0085fb8500 1 mgr[py] Loading python module
'alerts'
2024-03-06T09:24:44.156+0000 7f0085fb8500 1 mgr[py] Loading python module
'balancer'
2024-03-06T09:24:44.240+0000 7f0085fb8500 1 mgr[py] Loading python module
'cephadm'
2024-03-06T09:24:44.484+0000 7f0085fb8500 1 mgr[py] Loading python module
'crash'
2024-03-06T09:24:44.568+0000 7f0085fb8500 1 mgr[py] Loading python module
'dashboard'
2024-03-06T09:24:45.100+0000 7f0085fb8500 1 mgr[py] Loading python module
'devicehealth'
2024-03-06T09:24:45.184+0000 7f0085fb8500 1 mgr[py] Loading python module
'diskprediction_local'
2024-03-06T09:24:45.396+0000 7f0085fb8500 1 mgr[py] Loading python module
'influx'
2024-03-06T09:24:45.488+0000 7f0085fb8500 1 mgr[py] Loading python module
'insights'
2024-03-06T09:24:45.572+0000 7f0085fb8500 1 mgr[py] Loading python module
'iostat'
2024-03-06T09:24:45.724+0000 7f0085fb8500 1 mgr[py] Loading python module
'k8sevents'
2024-03-06T09:24:46.172+0000 7f0085fb8500 1 mgr[py] Loading python module
'localpool'
2024-03-06T09:24:46.260+0000 7f0085fb8500 1 mgr[py] Loading python module
'mds_autoscaler'
2024-03-06T09:24:46.416+0000 7f0085fb8500 1 mgr[py] Loading python module
'mirroring'
2024-03-06T09:24:46.528+0000 7f0085fb8500 1 mgr[py] Loading python module
'orchestrator'
2024-03-06T09:24:46.776+0000 7f0085fb8500 1 mgr[py] Loading python module
'osd_support'
2024-03-06T09:24:46.860+0000 7f0085fb8500 1 mgr[py] Loading python module
'pg_autoscaler'
2024-03-06T09:24:46.956+0000 7f0085fb8500 1 mgr[py] Loading python module
'progress'
2024-03-06T09:24:47.052+0000 7f0085fb8500 1 mgr[py] Loading python module
'prometheus'
2024-03-06T09:24:47.524+0000 7f0085fb8500 1 mgr[py] Loading python module
'rbd_support'
2024-03-06T09:24:47.640+0000 7f0085fb8500 1 mgr[py] Loading python module
'restful'
2024-03-06T09:24:47.924+0000 7f0085fb8500 1 mgr[py] Loading python module
'rook'
2024-03-06T09:24:48.536+0000 7f0085fb8500 1 mgr[py] Loading python module
'selftest'
2024-03-06T09:24:48.640+0000 7f0085fb8500 1 mgr[py] Loading python module
'snap_schedule'
2024-03-06T09:24:48.776+0000 7f0085fb8500 1 mgr[py] Loading python module
'stats'
2024-03-06T09:24:48.876+0000 7f0085fb8500 1 mgr[py] Loading python module
'status'
2024-03-06T09:24:48.984+0000 7f0085fb8500 1 mgr[py] Loading python module
'telegraf'
2024-03-06T09:24:49.088+0000 7f0085fb8500 1 mgr[py] Loading python module
'telemetry'
2024-03-06T09:24:49.248+0000 7f0085fb8500 1 mgr[py] Loading python module
'test_orchestrator'
2024-03-06T09:24:49.632+0000 7f0085fb8500 1 mgr[py] Loading python module
'volumes'
2024-03-06T09:24:49.832+0000 7f0085fb8500 1 mgr[py] Loading python module
'zabbix'
2024-03-06T09:24:49.936+0000 7f00739df700 0 [dashboard DEBUG root] setting
log level: INFO
2024-03-06T09:24:49.936+0000 7f00739df700 1 mgr load Constructed class from
module: dashboard
2024-03-06T09:24:49.936+0000 7f00731de700 0 ms_deliver_dispatch: unhandled
message 0x556eb3224160 mon_map magic: 0 v1 from mon.2 v2:10.10.71.1:3300/0
2024-03-06T09:24:49.936+0000 7f00739df700 0 [prometheus DEBUG root] setting
log level based on debug_mgr: WARNING (1/5)
2024-03-06T09:24:49.936+0000 7f00739df700 1 mgr load Constructed class from
module: prometheus
2024-03-06T09:24:49.936+0000 7f00235e9700 0 [dashboard INFO root] server:
ssl=no host=:: port=8082
2024-03-06T09:24:49.940+0000 7f00235e9700 0 [dashboard INFO root]
Configured CherryPy, starting engine...
2024-03-06T09:24:49.940+0000 7f00235e9700 0 [dashboard INFO root] Starting
engine...
2024-03-06T09:24:50.048+0000 7f00235e9700 0 [dashboard INFO root] Engine
started...
2024-03-06T09:24:51.584+0000 7f0843ec9500 0 set uid:gid to 167:167
(ceph:ceph)
2024-03-06T09:24:51.584+0000 7f0843ec9500 0 ceph version 16.2.2
(e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable), process
ceph-mgr, pid 7
2024-03-06T09:24:51.584+0000 7f0843ec9500 0 pidfile_write: ignore empty
--pid-file
# cephadm logs --fsid fcb373ce-7aaa-11eb-984f-e7c6e0038e87 --name
mgr.rke-sh1-2.lxmguj
Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE Bus
STARTING
Mar 06 09:27:18 rke-sh1-2 bash[623306]: CherryPy Checker:
Mar 06 09:27:18 rke-sh1-2 bash[623306]: The Application mounted at '' has an
empty config.
Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE
Serving on http://:::9283
Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE Bus
STARTED
Mar 06 09:27:18 rke-sh1-2 systemd[1]: Stopping Ceph mgr.rke-sh1-2.lxmguj for
fcb373ce-7aaa-11eb-984f-e7c6e0038e87...
Mar 06 09:27:18 rke-sh1-2 docker[624494]:
ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87-mgr.rke-sh1-2.lxmguj
Mar 06 09:27:18 rke-sh1-2 systemd[1]:
ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87@xxxxxxx-sh1-2.lxmguj.service: Main
process exited, code=exited, status=143/n/a
Mar 06 09:27:18 rke-sh1-2 systemd[1]:
ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87@xxxxxxx-sh1-2.lxmguj.service:
Failed with result 'exit-code'.
Mar 06 09:27:18 rke-sh1-2 systemd[1]: Stopped Ceph mgr.rke-sh1-2.lxmguj for
fcb373ce-7aaa-11eb-984f-e7c6e0038e87.
Mar 06 09:27:19 rke-sh1-2 systemd[1]: Started Ceph mgr.rke-sh1-2.lxmguj for
fcb373ce-7aaa-11eb-984f-e7c6e0038e87.
The mgr.rke-sh1-2.lxmguj daemon is crashlooping.
Do you have an idea on what going on ?
Issue with the dashboard module ?
Bets Regards,
Edouard FAZENDA
Technical Support
Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40
<https://www.csti.ch/> www.csti.ch
From: Edouard FAZENDA
Sent: mercredi, 6 mars 2024 09:42
To: ceph-users@xxxxxxx
Subject: Upgarde from 16.2.1 to 16.2.2 pacific stuck
Dear Ceph Community,
I am in the process of upgrading ceph pacific 16.2.1 to 16.2.2 , I have
followed the documentation :
https://docs.ceph.com/en/pacific/cephadm/upgrade/
My cluster is in Healthy state , but the upgrade is not going forward , as
on the cephadm logs I have the following :
# Ceph -W cephadm
2024-03-06T08:39:11.653447+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: Need to
upgrade myself (mgr.rke-sh1-1.qskoyj)
2024-03-06T08:39:12.281386+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: Updating
mgr.rke-sh1-2.lxmguj
2024-03-06T08:39:12.286096+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying daemon
mgr.rke-sh1-2.lxmguj on rke-sh1-2
2024-03-06T08:39:19.347877+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered out host
rke-sh1-1: could not verify host allowed virtual ips
2024-03-06T08:39:19.347989+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered out host
rke-sh1-3: could not verify host allowed virtual ips
2024-03-06T08:39:19.366355+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: Need to
upgrade myself (mgr.rke-sh1-1.qskoyj)
2024-03-06T08:39:19.965822+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: Updating
mgr.rke-sh1-2.lxmguj
2024-03-06T08:39:19.969089+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying daemon
mgr.rke-sh1-2.lxmguj on rke-sh1-2
2024-03-06T08:39:26.961455+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered out host
rke-sh1-1: could not verify host allowed virtual ips
2024-03-06T08:39:26.961502+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered out host
rke-sh1-3: could not verify host allowed virtual ips
2024-03-06T08:39:26.973897+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: Need to
upgrade myself (mgr.rke-sh1-1.qskoyj)
2024-03-06T08:39:27.623773+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: Updating
mgr.rke-sh1-2.lxmguj
2024-03-06T08:39:27.628115+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying daemon
mgr.rke-sh1-2.lxmguj on rke-sh1-2
My public_network is set :
root@rke-sh1-1:~# ceph config dump | grep public_network
mon advanced public_network
10.10.71.0/24
*
Do you have an idea why I have the following error :
Filtered out host: could not verify host allowed virtual ips
Current state of the upgrade :
# ceph orch upgrade status
{
"target_image":
"docker.io/ceph/ceph@sha256:8cdd8c7dfc7be5865255f0d59c048a1fb8d1335f6923996e
2c2d9439499b5cf2
<mailto:docker.io/ceph/ceph@sha256:8cdd8c7dfc7be5865255f0d59c048a1fb8d1335f6
923996e2c2d9439499b5cf2> ",
"in_progress": true,
"services_complete": [],
"progress": "0/35 ceph daemons upgraded",
"message": "Currently upgrading mgr daemons"
}
progress:
Upgrade to 16.2.2 (24m)
[............................]
Thanks for the help.
Best Regards,
Edouard FAZENDA
Technical Support
Chemin du Curé-Desclouds 2, CH-1226 THONEX +41 (0)22 869 04 40
<https://www.csti.ch/> www.csti.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx