Re: Upgrade stalled after upgrading managers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 17/12/2024 12:05, Torkil Svensgaard wrote:
Hi

Running upgrade from 18.2.4 to 19.2.0 and it managed to upgrade the managers but no further progress.

Now it actually seems to have upgraded 1 MON now then the orchestrator crashed again:

"
{
    "mon": {
"ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)": 4, "ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)": 1
    },
    "mgr": {
"ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)": 3
    },
    "osd": {
"ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)": 548
    },
    "mds": {
"ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)": 3
    },
    "overall": {
"ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)": 555, "ceph version 19.2.0 (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)": 4
    }
}
"

Mvh.

Torkil



If I fail over the mgr it goes:

"
[root@ceph-flash1 ~]# ceph orch upgrade status
Error ENOTSUP: Module 'orchestrator' is not enabled/loaded (required by command 'orch upgrade status'): use `ceph mgr module enable orchestrator` to enable it
"

 From mgr log:

"
...
2024-12-17T10:43:11.729+0000 7f70efafe640  0 log_channel(audit) log [DBG] : from='client.2110010386 -' entity='client.admin' cmd=[{"prefix": "orch upgrade status", "target": ["mon-mgr", ""]}]: dispatch 2024-12-17T10:43:11.733+0000 7f70ebaf6640  0 [cephadm INFO cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Bus STARTING 2024-12-17T10:43:11.733+0000 7f70ebaf6640  0 log_channel(cephadm) log [INF] : [17/Dec/2024:10:43:11] ENGINE Bus STARTING 2024-12-17T10:43:11.811+0000 7f70e7aee640  0 [dashboard INFO dashboard.module] Engine started. 2024-12-17T10:43:11.861+0000 7f70ebaf6640  0 [cephadm INFO cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Serving on https://172.21.15.148:7150 2024-12-17T10:43:11.861+0000 7f70ebaf6640  0 log_channel(cephadm) log [INF] : [17/Dec/2024:10:43:11] ENGINE Serving on https://172.21.15.148:7150 2024-12-17T10:43:11.864+0000 7f70a2d7a640  0 [cephadm ERROR cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Error in HTTPServer.serve
Traceback (most recent call last):
  File "/lib/python3.9/site-packages/cheroot/server.py", line 1823, in serve
     self._connections.run(self.expiration_interval)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 203, in run
     self._run(expiration_interval)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 246, in _run
     new_conn = self._from_server_socket(self.server.socket)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 300, in _from_server_socket
     s, ssl_env = self.server.ssl_adapter.wrap(s)
  File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line 277, in wrap
     s = self.context.wrap_socket(
   File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
     return self.sslsocket_class._create(
   File "/lib64/python3.9/ssl.py", line 1074, in _create
     self.do_handshake()
   File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
     self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:1133)

2024-12-17T10:43:11.865+0000 7f70a2d7a640 -1 log_channel(cephadm) log [ERR] : [17/Dec/2024:10:43:11] ENGINE Error in HTTPServer.serve
Traceback (most recent call last):
  File "/lib/python3.9/site-packages/cheroot/server.py", line 1823, in serve
     self._connections.run(self.expiration_interval)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 203, in run
     self._run(expiration_interval)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 246, in _run
     new_conn = self._from_server_socket(self.server.socket)
  File "/lib/python3.9/site-packages/cheroot/connections.py", line 300, in _from_server_socket
     s, ssl_env = self.server.ssl_adapter.wrap(s)
  File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line 277, in wrap
     s = self.context.wrap_socket(
   File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
     return self.sslsocket_class._create(
   File "/lib64/python3.9/ssl.py", line 1074, in _create
     self.do_handshake()
   File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
     self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:1133)

2024-12-17T10:43:11.963+0000 7f70ebaf6640  0 [cephadm INFO cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Serving on http://172.21.15.148:8765 2024-12-17T10:43:11.963+0000 7f70ebaf6640  0 log_channel(cephadm) log [INF] : [17/Dec/2024:10:43:11] ENGINE Serving on http://172.21.15.148:8765 2024-12-17T10:43:11.963+0000 7f70ebaf6640  0 [cephadm INFO cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Bus STARTED 2024-12-17T10:43:11.964+0000 7f70ebaf6640  0 log_channel(cephadm) log [INF] : [17/Dec/2024:10:43:11] ENGINE Bus STARTED
...
"

It will recover after some timeout, maybe 5-10 mins, and then just sit there with no upgrade progress.

Nothing in mgr/cephadm/osd_remove_queue.

Suggestions?

Mvh.

Torkil


--
Torkil Svensgaard
Sysadmin
MR-Forskningssektionen, afs. 714
DRCMR, Danish Research Centre for Magnetic Resonance
Hvidovre Hospital
Kettegård Allé 30
DK-2650 Hvidovre
Denmark
Tel: +45 386 22828
E-mail: torkil@xxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux