Re: Upgarde from 16.2.1 to 16.2.2 pacific stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Okay, so the first thing I would do is to stop the upgrade. Then make sure that you have two running MGRs with the current version of the rest of the cluster (.1). If no other daemons have been upgraded it shouldn't be a big issue. If necessary you can modify the unit.run file and specify there the container image for the MGRs. If they both start successfully try an upgrade to 16.2.15 (was just released this week) instead of 16.2.2.

Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>:

Dear Eugen,

I have removed one mgr on the node 3 , the second one is still crashlooping and on node 1 mgr is in 16.2.2

Not sure to understand your workaround.

* Stopping current upgrade to rollback if possible and afterward upgrading to latest release of pacific ?

Best Regards,



Edouard FAZENDA
Technical Support



Chemin du Curé-Desclouds 2, CH-1226 THONEX  +41 (0)22 869 04 40

www.csti.ch

-----Original Message-----
From: Eugen Block <eblock@xxxxxx>
Sent: mercredi, 6 mars 2024 10:47
To: ceph-users@xxxxxxx
Subject:  Re: Upgarde from 16.2.1 to 16.2.2 pacific stuck

There was another issue when having more than two MGRs, maybe you're hitting that (https://tracker.ceph.com/issues/57675, https://github.com/ceph/ceph/pull/48258). I believe my workaround was to set the global config to a newer image (target version) and then deployed a new mgr.


Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>:

The process has now started but I have the following error on mgr to
the second node



root@rke-sh1-1:~# ceph orch ps

NAME                          HOST       PORTS        STATUS
REFRESHED  AGE  VERSION  IMAGE ID      CONTAINER ID

crash.rke-sh1-1               rke-sh1-1               running (12d)  41s ago
12d  16.2.1   c757e4a3636b  e8652edb2b49

crash.rke-sh1-2               rke-sh1-2               running (12d)  2s ago
20M  16.2.1   c757e4a3636b  a1249a605ee0

crash.rke-sh1-3               rke-sh1-3               running (12d)  41s ago
12d  16.2.1   c757e4a3636b  026667bc1776

mds.cephfs.rke-sh1-1.ojmpnk   rke-sh1-1               running (12d)  41s ago
5M   16.2.1   c757e4a3636b  9b4c2b08b759

mds.cephfs.rke-sh1-2.isqjza   rke-sh1-2               running (12d)  2s ago
23M  16.2.1   c757e4a3636b  71681a5f34d3

mds.cephfs.rke-sh1-3.vdicdn   rke-sh1-3               running (12d)  41s ago
4M   16.2.1   c757e4a3636b  e89946ad6b7e

mgr.rke-sh1-1.qskoyj          rke-sh1-1  *:8082,9283  running (66m)  41s ago
2y   16.2.2   5e237c38caa6  123cabbc2994

mgr.rke-sh1-2.lxmguj          rke-sh1-2  *:8082,9283  running (6s)   2s ago
22M  16.2.2   5e237c38caa6  b2a9047be1d6

mgr.rke-sh1-3.ckunvo          rke-sh1-3  *:8082,9283  running (12d)  41s ago
7M   16.2.1   c757e4a3636b  2fcaf18f3218

mon.rke-sh1-1                 rke-sh1-1               running (37m)  41s ago
37m  16.2.1   c757e4a3636b  84e63e0415a8

mon.rke-sh1-2                 rke-sh1-2               running (12d)  2s ago
4M   16.2.1   c757e4a3636b  f4b32ba4466b

mon.rke-sh1-3                 rke-sh1-3               running (12d)  41s ago
12d  16.2.1   c757e4a3636b  d5e44c245998

osd.0                         rke-sh1-2               running (12d)  2s ago
3y   16.2.1   c757e4a3636b  7b0e69942c15

osd.1                         rke-sh1-3               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  4451654d9a2d

osd.10                        rke-sh1-3               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  3f9d5f95e284

osd.11                        rke-sh1-1               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  db1cc6d2e37f

osd.12                        rke-sh1-2               running (12d)  2s ago
3y   16.2.1   c757e4a3636b  de416c1ef766

osd.13                        rke-sh1-3               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  25a281cc5a9b

osd.14                        rke-sh1-1               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  62f25ba61667

osd.15                        rke-sh1-2               running (12d)  2s ago
3y   16.2.1   c757e4a3636b  d3514d823c45

osd.16                        rke-sh1-3               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  bba857759bfe

osd.17                        rke-sh1-1               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  59281d4bb3d0

osd.2                         rke-sh1-1               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  418041b5e60d

osd.3                         rke-sh1-2               running (12d)  2s ago
3y   16.2.1   c757e4a3636b  04a0e29d5623

osd.4                         rke-sh1-1               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  1cc78a5153d3

osd.5                         rke-sh1-3               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  39a4b11e31fb

osd.6                         rke-sh1-2               running (12d)  2s ago
3y   16.2.1   c757e4a3636b  2f218ffb566e

osd.7                         rke-sh1-1               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  cf761fbe4d5f

osd.8                         rke-sh1-3               running (12d)  41s ago
3y   16.2.1   c757e4a3636b  f9f85480e800

osd.9                         rke-sh1-2               running (12d)  2s ago
3y   16.2.1   c757e4a3636b  664c54ff46d2

rgw.default.rke-sh1-1.dgucwl  rke-sh1-1  *:8000       running (12d)  41s ago
22M  16.2.1   c757e4a3636b  f03212b955a7

rgw.default.rke-sh1-1.vylchc  rke-sh1-1  *:8001       running (12d)  41s ago
22M  16.2.1   c757e4a3636b  da486ce43fe5

rgw.default.rke-sh1-2.dfhhfw  rke-sh1-2  *:8000       running (12d)  2s ago
2y   16.2.1   c757e4a3636b  ef4089d0aef2

rgw.default.rke-sh1-2.efkbum  rke-sh1-2  *:8001       running (12d)  2s ago
2y   16.2.1   c757e4a3636b  9e053d5a2f7b

rgw.default.rke-sh1-3.krfgey  rke-sh1-3  *:8001       running (12d)  41s ago
9M   16.2.1   c757e4a3636b  45cd3d75edd3

rgw.default.rke-sh1-3.pwdbmp  rke-sh1-3  *:8000       running (12d)  41s ago
9M   16.2.1   c757e4a3636b  e2710265a7f4



#tail -f
/var/log/ceph/fcb373ce-7aaa-11eb-984f-e7c6e0038e87/ceph-mgr.rke-sh1-2.
lxmguj
.log

2024-03-06T09:24:42.468+0000 7fe68b500700  0 [dashboard DEBUG root]
setting log level: INFO

2024-03-06T09:24:42.468+0000 7fe68b500700  1 mgr load Constructed
class from
module: dashboard

2024-03-06T09:24:42.468+0000 7fe68acff700  0 ms_deliver_dispatch:
unhandled message 0x55f722292160 mon_map magic: 0 v1 from mon.0
v2:10.10.71.2:3300/0

2024-03-06T09:24:42.468+0000 7fe68b500700  0 [prometheus DEBUG root]
setting log level based on debug_mgr: WARNING (1/5)

2024-03-06T09:24:42.468+0000 7fe68b500700  1 mgr load Constructed
class from
module: prometheus

2024-03-06T09:24:42.468+0000 7fe64110d700  0 [dashboard INFO root] server:
ssl=no host=:: port=8082

2024-03-06T09:24:42.472+0000 7fe64110d700  0 [dashboard INFO root]
Configured CherryPy, starting engine...

2024-03-06T09:24:42.472+0000 7fe64110d700  0 [dashboard INFO root]
Starting engine...

2024-03-06T09:24:42.580+0000 7fe64110d700  0 [dashboard INFO root]
Engine started...

2024-03-06T09:24:44.020+0000 7f0085fb8500  0 set uid:gid to 167:167
(ceph:ceph)

2024-03-06T09:24:44.020+0000 7f0085fb8500  0 ceph version 16.2.2
(e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable), process
ceph-mgr, pid 7

2024-03-06T09:24:44.020+0000 7f0085fb8500  0 pidfile_write: ignore
empty --pid-file

2024-03-06T09:24:44.044+0000 7f0085fb8500  1 mgr[py] Loading python
module 'alerts'

2024-03-06T09:24:44.156+0000 7f0085fb8500  1 mgr[py] Loading python
module 'balancer'

2024-03-06T09:24:44.240+0000 7f0085fb8500  1 mgr[py] Loading python
module 'cephadm'

2024-03-06T09:24:44.484+0000 7f0085fb8500  1 mgr[py] Loading python
module 'crash'

2024-03-06T09:24:44.568+0000 7f0085fb8500  1 mgr[py] Loading python
module 'dashboard'

2024-03-06T09:24:45.100+0000 7f0085fb8500  1 mgr[py] Loading python
module 'devicehealth'

2024-03-06T09:24:45.184+0000 7f0085fb8500  1 mgr[py] Loading python
module 'diskprediction_local'

2024-03-06T09:24:45.396+0000 7f0085fb8500  1 mgr[py] Loading python
module 'influx'

2024-03-06T09:24:45.488+0000 7f0085fb8500  1 mgr[py] Loading python
module 'insights'

2024-03-06T09:24:45.572+0000 7f0085fb8500  1 mgr[py] Loading python
module 'iostat'

2024-03-06T09:24:45.724+0000 7f0085fb8500  1 mgr[py] Loading python
module 'k8sevents'

2024-03-06T09:24:46.172+0000 7f0085fb8500  1 mgr[py] Loading python
module 'localpool'

2024-03-06T09:24:46.260+0000 7f0085fb8500  1 mgr[py] Loading python
module 'mds_autoscaler'

2024-03-06T09:24:46.416+0000 7f0085fb8500  1 mgr[py] Loading python
module 'mirroring'

2024-03-06T09:24:46.528+0000 7f0085fb8500  1 mgr[py] Loading python
module 'orchestrator'

2024-03-06T09:24:46.776+0000 7f0085fb8500  1 mgr[py] Loading python
module 'osd_support'

2024-03-06T09:24:46.860+0000 7f0085fb8500  1 mgr[py] Loading python
module 'pg_autoscaler'

2024-03-06T09:24:46.956+0000 7f0085fb8500  1 mgr[py] Loading python
module 'progress'

2024-03-06T09:24:47.052+0000 7f0085fb8500  1 mgr[py] Loading python
module 'prometheus'

2024-03-06T09:24:47.524+0000 7f0085fb8500  1 mgr[py] Loading python
module 'rbd_support'

2024-03-06T09:24:47.640+0000 7f0085fb8500  1 mgr[py] Loading python
module 'restful'

2024-03-06T09:24:47.924+0000 7f0085fb8500  1 mgr[py] Loading python
module 'rook'

2024-03-06T09:24:48.536+0000 7f0085fb8500  1 mgr[py] Loading python
module 'selftest'

2024-03-06T09:24:48.640+0000 7f0085fb8500  1 mgr[py] Loading python
module 'snap_schedule'

2024-03-06T09:24:48.776+0000 7f0085fb8500  1 mgr[py] Loading python
module 'stats'

2024-03-06T09:24:48.876+0000 7f0085fb8500  1 mgr[py] Loading python
module 'status'

2024-03-06T09:24:48.984+0000 7f0085fb8500  1 mgr[py] Loading python
module 'telegraf'

2024-03-06T09:24:49.088+0000 7f0085fb8500  1 mgr[py] Loading python
module 'telemetry'

2024-03-06T09:24:49.248+0000 7f0085fb8500  1 mgr[py] Loading python
module 'test_orchestrator'

2024-03-06T09:24:49.632+0000 7f0085fb8500  1 mgr[py] Loading python
module 'volumes'

2024-03-06T09:24:49.832+0000 7f0085fb8500  1 mgr[py] Loading python
module 'zabbix'

2024-03-06T09:24:49.936+0000 7f00739df700  0 [dashboard DEBUG root]
setting log level: INFO

2024-03-06T09:24:49.936+0000 7f00739df700  1 mgr load Constructed
class from
module: dashboard

2024-03-06T09:24:49.936+0000 7f00731de700  0 ms_deliver_dispatch:
unhandled message 0x556eb3224160 mon_map magic: 0 v1 from mon.2
v2:10.10.71.1:3300/0

2024-03-06T09:24:49.936+0000 7f00739df700  0 [prometheus DEBUG root]
setting log level based on debug_mgr: WARNING (1/5)

2024-03-06T09:24:49.936+0000 7f00739df700  1 mgr load Constructed
class from
module: prometheus

2024-03-06T09:24:49.936+0000 7f00235e9700  0 [dashboard INFO root] server:
ssl=no host=:: port=8082

2024-03-06T09:24:49.940+0000 7f00235e9700  0 [dashboard INFO root]
Configured CherryPy, starting engine...

2024-03-06T09:24:49.940+0000 7f00235e9700  0 [dashboard INFO root]
Starting engine...

2024-03-06T09:24:50.048+0000 7f00235e9700  0 [dashboard INFO root]
Engine started...

2024-03-06T09:24:51.584+0000 7f0843ec9500  0 set uid:gid to 167:167
(ceph:ceph)

2024-03-06T09:24:51.584+0000 7f0843ec9500  0 ceph version 16.2.2
(e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable), process
ceph-mgr, pid 7

2024-03-06T09:24:51.584+0000 7f0843ec9500  0 pidfile_write: ignore
empty --pid-file



# cephadm logs --fsid fcb373ce-7aaa-11eb-984f-e7c6e0038e87 --name
mgr.rke-sh1-2.lxmguj

Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE
Bus STARTING

Mar 06 09:27:18 rke-sh1-2 bash[623306]: CherryPy Checker:

Mar 06 09:27:18 rke-sh1-2 bash[623306]: The Application mounted at ''
has an empty config.

Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE
Serving on http://:::9283

Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE
Bus STARTED

Mar 06 09:27:18 rke-sh1-2 systemd[1]: Stopping Ceph
mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87...

Mar 06 09:27:18 rke-sh1-2 docker[624494]:
ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87-mgr.rke-sh1-2.lxmguj

Mar 06 09:27:18 rke-sh1-2 systemd[1]:
ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87@xxxxxxx-sh1-2.lxmguj.service
: Main process exited, code=exited, status=143/n/a

Mar 06 09:27:18 rke-sh1-2 systemd[1]:
ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87@xxxxxxx-sh1-2.lxmguj.service:
Failed with result 'exit-code'.

Mar 06 09:27:18 rke-sh1-2 systemd[1]: Stopped Ceph
mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87.

Mar 06 09:27:19 rke-sh1-2 systemd[1]: Started Ceph
mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87.



The mgr.rke-sh1-2.lxmguj daemon is crashlooping.



Do you have an idea on what going on ?



Issue with the dashboard module ?



Bets Regards,



Edouard FAZENDA

Technical Support







Chemin du Curé-Desclouds 2, CH-1226 THONEX  +41 (0)22 869 04 40



 <https://www.csti.ch/> www.csti.ch



From: Edouard FAZENDA
Sent: mercredi, 6 mars 2024 09:42
To: ceph-users@xxxxxxx
Subject: Upgarde from 16.2.1 to 16.2.2 pacific stuck



Dear Ceph Community,



I am in the process of upgrading ceph pacific 16.2.1 to 16.2.2 , I
have followed the documentation :
https://docs.ceph.com/en/pacific/cephadm/upgrade/



My cluster is in Healthy state , but the upgrade is not going forward
, as on the cephadm logs I have the following :



# Ceph -W cephadm

2024-03-06T08:39:11.653447+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Need to upgrade myself (mgr.rke-sh1-1.qskoyj)

2024-03-06T08:39:12.281386+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Updating mgr.rke-sh1-2.lxmguj

2024-03-06T08:39:12.286096+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying
daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2

2024-03-06T08:39:19.347877+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered
out host
rke-sh1-1: could not verify host allowed virtual ips

2024-03-06T08:39:19.347989+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered
out host
rke-sh1-3: could not verify host allowed virtual ips

2024-03-06T08:39:19.366355+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Need to upgrade myself (mgr.rke-sh1-1.qskoyj)

2024-03-06T08:39:19.965822+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Updating mgr.rke-sh1-2.lxmguj

2024-03-06T08:39:19.969089+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying
daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2

2024-03-06T08:39:26.961455+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered
out host
rke-sh1-1: could not verify host allowed virtual ips

2024-03-06T08:39:26.961502+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered
out host
rke-sh1-3: could not verify host allowed virtual ips

2024-03-06T08:39:26.973897+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Need to upgrade myself (mgr.rke-sh1-1.qskoyj)

2024-03-06T08:39:27.623773+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade:
Updating mgr.rke-sh1-2.lxmguj

2024-03-06T08:39:27.628115+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying
daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2



My public_network is set :



root@rke-sh1-1:~# ceph config dump  | grep public_network

  mon                                          advanced  public_network
10.10.71.0/24

                                                                     *

Do you have an idea why I have the following error :



Filtered out host: could not verify host allowed virtual ips





Current state of the upgrade :



# ceph orch upgrade status

{

    "target_image":
"docker.io/ceph/ceph@sha256:8cdd8c7dfc7be5865255f0d59c048a1fb8d1335f69
23996e
2c2d9439499b5cf2
<mailto:docker.io/ceph/ceph@sha256:8cdd8c7dfc7be5865255f0d59c048a1fb8d
1335f6
923996e2c2d9439499b5cf2> ",

    "in_progress": true,

    "services_complete": [],

    "progress": "0/35 ceph daemons upgraded",

    "message": "Currently upgrading mgr daemons"

}



  progress:

    Upgrade to 16.2.2 (24m)

      [............................]



Thanks for the help.



Best Regards,



Edouard FAZENDA

Technical Support







Chemin du Curé-Desclouds 2, CH-1226 THONEX  +41 (0)22 869 04 40



 <https://www.csti.ch/> www.csti.ch


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux