Re: Upgarde from 16.2.1 to 16.2.2 pacific stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Eugen,

I have removed one mgr on the node 3 , the second one is still crashlooping and on node 1 mgr is in 16.2.2

Not sure to understand your workaround.

* Stopping current upgrade to rollback if possible  and afterward upgrading to latest release of pacific ? 

Best Regards, 



Edouard FAZENDA
Technical Support
 


Chemin du Curé-Desclouds 2, CH-1226 THONEX  +41 (0)22 869 04 40
 
www.csti.ch

-----Original Message-----
From: Eugen Block <eblock@xxxxxx> 
Sent: mercredi, 6 mars 2024 10:47
To: ceph-users@xxxxxxx
Subject:  Re: Upgarde from 16.2.1 to 16.2.2 pacific stuck

There was another issue when having more than two MGRs, maybe you're hitting that (https://tracker.ceph.com/issues/57675,
https://github.com/ceph/ceph/pull/48258). I believe my workaround was to set the global config to a newer image (target version) and then deployed a new mgr.


Zitat von Edouard FAZENDA <e.fazenda@xxxxxxx>:

> The process has now started but I have the following error on mgr to 
> the second node
>
>
>
> root@rke-sh1-1:~# ceph orch ps
>
> NAME                          HOST       PORTS        STATUS
> REFRESHED  AGE  VERSION  IMAGE ID      CONTAINER ID
>
> crash.rke-sh1-1               rke-sh1-1               running (12d)  41s ago
> 12d  16.2.1   c757e4a3636b  e8652edb2b49
>
> crash.rke-sh1-2               rke-sh1-2               running (12d)  2s ago
> 20M  16.2.1   c757e4a3636b  a1249a605ee0
>
> crash.rke-sh1-3               rke-sh1-3               running (12d)  41s ago
> 12d  16.2.1   c757e4a3636b  026667bc1776
>
> mds.cephfs.rke-sh1-1.ojmpnk   rke-sh1-1               running (12d)  41s ago
> 5M   16.2.1   c757e4a3636b  9b4c2b08b759
>
> mds.cephfs.rke-sh1-2.isqjza   rke-sh1-2               running (12d)  2s ago
> 23M  16.2.1   c757e4a3636b  71681a5f34d3
>
> mds.cephfs.rke-sh1-3.vdicdn   rke-sh1-3               running (12d)  41s ago
> 4M   16.2.1   c757e4a3636b  e89946ad6b7e
>
> mgr.rke-sh1-1.qskoyj          rke-sh1-1  *:8082,9283  running (66m)  41s ago
> 2y   16.2.2   5e237c38caa6  123cabbc2994
>
> mgr.rke-sh1-2.lxmguj          rke-sh1-2  *:8082,9283  running (6s)   2s ago
> 22M  16.2.2   5e237c38caa6  b2a9047be1d6
>
> mgr.rke-sh1-3.ckunvo          rke-sh1-3  *:8082,9283  running (12d)  41s ago
> 7M   16.2.1   c757e4a3636b  2fcaf18f3218
>
> mon.rke-sh1-1                 rke-sh1-1               running (37m)  41s ago
> 37m  16.2.1   c757e4a3636b  84e63e0415a8
>
> mon.rke-sh1-2                 rke-sh1-2               running (12d)  2s ago
> 4M   16.2.1   c757e4a3636b  f4b32ba4466b
>
> mon.rke-sh1-3                 rke-sh1-3               running (12d)  41s ago
> 12d  16.2.1   c757e4a3636b  d5e44c245998
>
> osd.0                         rke-sh1-2               running (12d)  2s ago
> 3y   16.2.1   c757e4a3636b  7b0e69942c15
>
> osd.1                         rke-sh1-3               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  4451654d9a2d
>
> osd.10                        rke-sh1-3               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  3f9d5f95e284
>
> osd.11                        rke-sh1-1               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  db1cc6d2e37f
>
> osd.12                        rke-sh1-2               running (12d)  2s ago
> 3y   16.2.1   c757e4a3636b  de416c1ef766
>
> osd.13                        rke-sh1-3               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  25a281cc5a9b
>
> osd.14                        rke-sh1-1               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  62f25ba61667
>
> osd.15                        rke-sh1-2               running (12d)  2s ago
> 3y   16.2.1   c757e4a3636b  d3514d823c45
>
> osd.16                        rke-sh1-3               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  bba857759bfe
>
> osd.17                        rke-sh1-1               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  59281d4bb3d0
>
> osd.2                         rke-sh1-1               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  418041b5e60d
>
> osd.3                         rke-sh1-2               running (12d)  2s ago
> 3y   16.2.1   c757e4a3636b  04a0e29d5623
>
> osd.4                         rke-sh1-1               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  1cc78a5153d3
>
> osd.5                         rke-sh1-3               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  39a4b11e31fb
>
> osd.6                         rke-sh1-2               running (12d)  2s ago
> 3y   16.2.1   c757e4a3636b  2f218ffb566e
>
> osd.7                         rke-sh1-1               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  cf761fbe4d5f
>
> osd.8                         rke-sh1-3               running (12d)  41s ago
> 3y   16.2.1   c757e4a3636b  f9f85480e800
>
> osd.9                         rke-sh1-2               running (12d)  2s ago
> 3y   16.2.1   c757e4a3636b  664c54ff46d2
>
> rgw.default.rke-sh1-1.dgucwl  rke-sh1-1  *:8000       running (12d)  41s ago
> 22M  16.2.1   c757e4a3636b  f03212b955a7
>
> rgw.default.rke-sh1-1.vylchc  rke-sh1-1  *:8001       running (12d)  41s ago
> 22M  16.2.1   c757e4a3636b  da486ce43fe5
>
> rgw.default.rke-sh1-2.dfhhfw  rke-sh1-2  *:8000       running (12d)  2s ago
> 2y   16.2.1   c757e4a3636b  ef4089d0aef2
>
> rgw.default.rke-sh1-2.efkbum  rke-sh1-2  *:8001       running (12d)  2s ago
> 2y   16.2.1   c757e4a3636b  9e053d5a2f7b
>
> rgw.default.rke-sh1-3.krfgey  rke-sh1-3  *:8001       running (12d)  41s ago
> 9M   16.2.1   c757e4a3636b  45cd3d75edd3
>
> rgw.default.rke-sh1-3.pwdbmp  rke-sh1-3  *:8000       running (12d)  41s ago
> 9M   16.2.1   c757e4a3636b  e2710265a7f4
>
>
>
> #tail -f
> /var/log/ceph/fcb373ce-7aaa-11eb-984f-e7c6e0038e87/ceph-mgr.rke-sh1-2.
> lxmguj
> .log
>
> 2024-03-06T09:24:42.468+0000 7fe68b500700  0 [dashboard DEBUG root] 
> setting log level: INFO
>
> 2024-03-06T09:24:42.468+0000 7fe68b500700  1 mgr load Constructed 
> class from
> module: dashboard
>
> 2024-03-06T09:24:42.468+0000 7fe68acff700  0 ms_deliver_dispatch: 
> unhandled message 0x55f722292160 mon_map magic: 0 v1 from mon.0 
> v2:10.10.71.2:3300/0
>
> 2024-03-06T09:24:42.468+0000 7fe68b500700  0 [prometheus DEBUG root] 
> setting log level based on debug_mgr: WARNING (1/5)
>
> 2024-03-06T09:24:42.468+0000 7fe68b500700  1 mgr load Constructed 
> class from
> module: prometheus
>
> 2024-03-06T09:24:42.468+0000 7fe64110d700  0 [dashboard INFO root] server:
> ssl=no host=:: port=8082
>
> 2024-03-06T09:24:42.472+0000 7fe64110d700  0 [dashboard INFO root] 
> Configured CherryPy, starting engine...
>
> 2024-03-06T09:24:42.472+0000 7fe64110d700  0 [dashboard INFO root] 
> Starting engine...
>
> 2024-03-06T09:24:42.580+0000 7fe64110d700  0 [dashboard INFO root] 
> Engine started...
>
> 2024-03-06T09:24:44.020+0000 7f0085fb8500  0 set uid:gid to 167:167
> (ceph:ceph)
>
> 2024-03-06T09:24:44.020+0000 7f0085fb8500  0 ceph version 16.2.2
> (e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable), process 
> ceph-mgr, pid 7
>
> 2024-03-06T09:24:44.020+0000 7f0085fb8500  0 pidfile_write: ignore 
> empty --pid-file
>
> 2024-03-06T09:24:44.044+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'alerts'
>
> 2024-03-06T09:24:44.156+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'balancer'
>
> 2024-03-06T09:24:44.240+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'cephadm'
>
> 2024-03-06T09:24:44.484+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'crash'
>
> 2024-03-06T09:24:44.568+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'dashboard'
>
> 2024-03-06T09:24:45.100+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'devicehealth'
>
> 2024-03-06T09:24:45.184+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'diskprediction_local'
>
> 2024-03-06T09:24:45.396+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'influx'
>
> 2024-03-06T09:24:45.488+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'insights'
>
> 2024-03-06T09:24:45.572+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'iostat'
>
> 2024-03-06T09:24:45.724+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'k8sevents'
>
> 2024-03-06T09:24:46.172+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'localpool'
>
> 2024-03-06T09:24:46.260+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'mds_autoscaler'
>
> 2024-03-06T09:24:46.416+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'mirroring'
>
> 2024-03-06T09:24:46.528+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'orchestrator'
>
> 2024-03-06T09:24:46.776+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'osd_support'
>
> 2024-03-06T09:24:46.860+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'pg_autoscaler'
>
> 2024-03-06T09:24:46.956+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'progress'
>
> 2024-03-06T09:24:47.052+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'prometheus'
>
> 2024-03-06T09:24:47.524+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'rbd_support'
>
> 2024-03-06T09:24:47.640+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'restful'
>
> 2024-03-06T09:24:47.924+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'rook'
>
> 2024-03-06T09:24:48.536+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'selftest'
>
> 2024-03-06T09:24:48.640+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'snap_schedule'
>
> 2024-03-06T09:24:48.776+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'stats'
>
> 2024-03-06T09:24:48.876+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'status'
>
> 2024-03-06T09:24:48.984+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'telegraf'
>
> 2024-03-06T09:24:49.088+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'telemetry'
>
> 2024-03-06T09:24:49.248+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'test_orchestrator'
>
> 2024-03-06T09:24:49.632+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'volumes'
>
> 2024-03-06T09:24:49.832+0000 7f0085fb8500  1 mgr[py] Loading python 
> module 'zabbix'
>
> 2024-03-06T09:24:49.936+0000 7f00739df700  0 [dashboard DEBUG root] 
> setting log level: INFO
>
> 2024-03-06T09:24:49.936+0000 7f00739df700  1 mgr load Constructed 
> class from
> module: dashboard
>
> 2024-03-06T09:24:49.936+0000 7f00731de700  0 ms_deliver_dispatch: 
> unhandled message 0x556eb3224160 mon_map magic: 0 v1 from mon.2 
> v2:10.10.71.1:3300/0
>
> 2024-03-06T09:24:49.936+0000 7f00739df700  0 [prometheus DEBUG root] 
> setting log level based on debug_mgr: WARNING (1/5)
>
> 2024-03-06T09:24:49.936+0000 7f00739df700  1 mgr load Constructed 
> class from
> module: prometheus
>
> 2024-03-06T09:24:49.936+0000 7f00235e9700  0 [dashboard INFO root] server:
> ssl=no host=:: port=8082
>
> 2024-03-06T09:24:49.940+0000 7f00235e9700  0 [dashboard INFO root] 
> Configured CherryPy, starting engine...
>
> 2024-03-06T09:24:49.940+0000 7f00235e9700  0 [dashboard INFO root] 
> Starting engine...
>
> 2024-03-06T09:24:50.048+0000 7f00235e9700  0 [dashboard INFO root] 
> Engine started...
>
> 2024-03-06T09:24:51.584+0000 7f0843ec9500  0 set uid:gid to 167:167
> (ceph:ceph)
>
> 2024-03-06T09:24:51.584+0000 7f0843ec9500  0 ceph version 16.2.2
> (e8f22dde28889481f4dda2beb8a07788204821d3) pacific (stable), process 
> ceph-mgr, pid 7
>
> 2024-03-06T09:24:51.584+0000 7f0843ec9500  0 pidfile_write: ignore 
> empty --pid-file
>
>
>
> # cephadm logs --fsid fcb373ce-7aaa-11eb-984f-e7c6e0038e87 --name 
> mgr.rke-sh1-2.lxmguj
>
> Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE 
> Bus STARTING
>
> Mar 06 09:27:18 rke-sh1-2 bash[623306]: CherryPy Checker:
>
> Mar 06 09:27:18 rke-sh1-2 bash[623306]: The Application mounted at '' 
> has an empty config.
>
> Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE 
> Serving on http://:::9283
>
> Mar 06 09:27:18 rke-sh1-2 bash[623306]: [06/Mar/2024:09:27:18] ENGINE 
> Bus STARTED
>
> Mar 06 09:27:18 rke-sh1-2 systemd[1]: Stopping Ceph 
> mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87...
>
> Mar 06 09:27:18 rke-sh1-2 docker[624494]:
> ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87-mgr.rke-sh1-2.lxmguj
>
> Mar 06 09:27:18 rke-sh1-2 systemd[1]:
> ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87@xxxxxxx-sh1-2.lxmguj.service
> : Main process exited, code=exited, status=143/n/a
>
> Mar 06 09:27:18 rke-sh1-2 systemd[1]:
> ceph-fcb373ce-7aaa-11eb-984f-e7c6e0038e87@xxxxxxx-sh1-2.lxmguj.service:
> Failed with result 'exit-code'.
>
> Mar 06 09:27:18 rke-sh1-2 systemd[1]: Stopped Ceph 
> mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87.
>
> Mar 06 09:27:19 rke-sh1-2 systemd[1]: Started Ceph 
> mgr.rke-sh1-2.lxmguj for fcb373ce-7aaa-11eb-984f-e7c6e0038e87.
>
>
>
> The mgr.rke-sh1-2.lxmguj daemon is crashlooping.
>
>
>
> Do you have an idea on what going on ?
>
>
>
> Issue with the dashboard module ?
>
>
>
> Bets Regards,
>
>
>
> Edouard FAZENDA
>
> Technical Support
>
>
>
>
>
>
>
> Chemin du Curé-Desclouds 2, CH-1226 THONEX  +41 (0)22 869 04 40
>
>
>
>  <https://www.csti.ch/> www.csti.ch
>
>
>
> From: Edouard FAZENDA
> Sent: mercredi, 6 mars 2024 09:42
> To: ceph-users@xxxxxxx
> Subject: Upgarde from 16.2.1 to 16.2.2 pacific stuck
>
>
>
> Dear Ceph Community,
>
>
>
> I am in the process of upgrading ceph pacific 16.2.1 to 16.2.2 , I 
> have followed the documentation :
> https://docs.ceph.com/en/pacific/cephadm/upgrade/
>
>
>
> My cluster is in Healthy state , but the upgrade is not going forward 
> , as on the cephadm logs I have the following :
>
>
>
> # Ceph -W cephadm
>
> 2024-03-06T08:39:11.653447+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: 
> Need to upgrade myself (mgr.rke-sh1-1.qskoyj)
>
> 2024-03-06T08:39:12.281386+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: 
> Updating mgr.rke-sh1-2.lxmguj
>
> 2024-03-06T08:39:12.286096+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying 
> daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2
>
> 2024-03-06T08:39:19.347877+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered 
> out host
> rke-sh1-1: could not verify host allowed virtual ips
>
> 2024-03-06T08:39:19.347989+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered 
> out host
> rke-sh1-3: could not verify host allowed virtual ips
>
> 2024-03-06T08:39:19.366355+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: 
> Need to upgrade myself (mgr.rke-sh1-1.qskoyj)
>
> 2024-03-06T08:39:19.965822+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: 
> Updating mgr.rke-sh1-2.lxmguj
>
> 2024-03-06T08:39:19.969089+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying 
> daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2
>
> 2024-03-06T08:39:26.961455+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered 
> out host
> rke-sh1-1: could not verify host allowed virtual ips
>
> 2024-03-06T08:39:26.961502+0000 mgr.rke-sh1-1.qskoyj [INF] Filtered 
> out host
> rke-sh1-3: could not verify host allowed virtual ips
>
> 2024-03-06T08:39:26.973897+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: 
> Need to upgrade myself (mgr.rke-sh1-1.qskoyj)
>
> 2024-03-06T08:39:27.623773+0000 mgr.rke-sh1-1.qskoyj [INF] Upgrade: 
> Updating mgr.rke-sh1-2.lxmguj
>
> 2024-03-06T08:39:27.628115+0000 mgr.rke-sh1-1.qskoyj [INF] Deploying 
> daemon mgr.rke-sh1-2.lxmguj on rke-sh1-2
>
>
>
> My public_network is set :
>
>
>
> root@rke-sh1-1:~# ceph config dump  | grep public_network
>
>   mon                                          advanced  public_network
> 10.10.71.0/24
>
>                                                                      *
>
> Do you have an idea why I have the following error :
>
>
>
> Filtered out host: could not verify host allowed virtual ips
>
>
>
>
>
> Current state of the upgrade :
>
>
>
> # ceph orch upgrade status
>
> {
>
>     "target_image":
> "docker.io/ceph/ceph@sha256:8cdd8c7dfc7be5865255f0d59c048a1fb8d1335f69
> 23996e
> 2c2d9439499b5cf2
> <mailto:docker.io/ceph/ceph@sha256:8cdd8c7dfc7be5865255f0d59c048a1fb8d
> 1335f6
> 923996e2c2d9439499b5cf2> ",
>
>     "in_progress": true,
>
>     "services_complete": [],
>
>     "progress": "0/35 ceph daemons upgraded",
>
>     "message": "Currently upgrading mgr daemons"
>
> }
>
>
>
>   progress:
>
>     Upgrade to 16.2.2 (24m)
>
>       [............................]
>
>
>
> Thanks for the help.
>
>
>
> Best Regards,
>
>
>
> Edouard FAZENDA
>
> Technical Support
>
>
>
>
>
>
>
> Chemin du Curé-Desclouds 2, CH-1226 THONEX  +41 (0)22 869 04 40
>
>
>
>  <https://www.csti.ch/> www.csti.ch


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux