Re: Upgrade stalled after upgrading managers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Anytime.
Just adding some weight here: fully agree, users upgrading to 19.2.0 should
be aware of possible balancer issues.


Best,
Laimis J.

On Tue, Dec 17, 2024, 13:54 Torkil Svensgaard <torkil@xxxxxxxx> wrote:

> Thanks guys, turning off the balancer seems to have fixed it.
>
> Mvh.
>
> Torkil
>
> On 17/12/2024 12:40, Eugen Block wrote:
> > I know 19.2.1 is already in the validation phase, but it would make
> > sense (to me) to add this to the upgrade notes for Squid (https://
> > docs.ceph.com/en/latest/releases/squid/#v19-2-0-squid) until the fix
> has
> > been released. Similar to the note about ISCSI users. Adding Zac here
> > directly.
> >
> > Zitat von Laimis Juzeliūnas <laimis.juzeliunas@xxxxxxxxxx>:
> >
> >> Hi Torkil,
> >>
> >> Possible that you are hitting balancer issues on 19.2.0 for clusters
> >> with larger pg numbers: https://tracker.ceph.com/issues/68657
> >> Try turning it off with ceph balancer off
> >>
> >> Best,
> >> Laimis J.
> >>
> >>> On 17 Dec 2024, at 13:15, Torkil Svensgaard <torkil@xxxxxxxx> wrote:
> >>>
> >>>
> >>>
> >>> On 17/12/2024 12:05, Torkil Svensgaard wrote:
> >>>> Hi
> >>>> Running upgrade from 18.2.4 to 19.2.0 and it managed to upgrade the
> >>>> managers but no further progress.
> >>>
> >>> Now it actually seems to have upgraded 1 MON now then the
> >>> orchestrator crashed again:
> >>>
> >>> "
> >>> {
> >>>    "mon": {
> >>>     "ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d)
> >>> reef (stable)": 4,
> >>>        "ceph version 19.2.0
> >>> (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)": 1
> >>>    },
> >>>    "mgr": {
> >>>        "ceph version 19.2.0
> >>> (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)": 3
> >>>    },
> >>>    "osd": {
> >>>     "ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d)
> >>> reef (stable)": 548
> >>>    },
> >>>    "mds": {
> >>>     "ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d)
> >>> reef (stable)": 3
> >>>    },
> >>>    "overall": {
> >>>        "ceph version 18.2.4
> >>> (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)": 555,
> >>>        "ceph version 19.2.0
> >>> (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)": 4
> >>>    }
> >>> }
> >>> "
> >>>
> >>> Mvh.
> >>>
> >>> Torkil
> >>>
> >>>
> >>>> If I fail over the mgr it goes:
> >>>> "
> >>>> [root@ceph-flash1 ~]# ceph orch upgrade status
> >>>> Error ENOTSUP: Module 'orchestrator' is not enabled/loaded (required
> >>>> by command 'orch upgrade status'): use `ceph mgr module enable
> >>>> orchestrator` to enable it
> >>>> "
> >>>> From mgr log:
> >>>> "
> >>>> ...
> >>>> 2024-12-17T10:43:11.729+0000 7f70efafe640  0 log_channel(audit) log
> >>>> [DBG] : from='client.2110010386 -' entity='client.admin'
> >>>> cmd=[{"prefix": "orch upgrade status", "target": ["mon-mgr", ""]}]:
> >>>> dispatch
> >>>> 2024-12-17T10:43:11.733+0000 7f70ebaf6640  0 [cephadm INFO
> >>>> cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Bus STARTING
> >>>> 2024-12-17T10:43:11.733+0000 7f70ebaf6640  0 log_channel(cephadm)
> >>>> log [INF] : [17/Dec/2024:10:43:11] ENGINE Bus STARTING
> >>>> 2024-12-17T10:43:11.811+0000 7f70e7aee640  0 [dashboard INFO
> >>>> dashboard.module] Engine started.
> >>>> 2024-12-17T10:43:11.861+0000 7f70ebaf6640  0 [cephadm INFO
> >>>> cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Serving on https://
> >>>> www.google.com/url?q=https://172.21.15.148:7150&source=gmail-
> >>>> imap&ust=1735039047000000&usg=AOvVaw3LyWY24vMZA-AbVVOsv3Z9
> >>>> 2024-12-17T10:43:11.861+0000 7f70ebaf6640  0 log_channel(cephadm)
> >>>> log [INF] : [17/Dec/2024:10:43:11] ENGINE Serving on https://
> >>>> www.google.com/url?q=https://172.21.15.148:7150&source=gmail-
> >>>> imap&ust=1735039047000000&usg=AOvVaw3LyWY24vMZA-AbVVOsv3Z9
> >>>> 2024-12-17T10:43:11.864+0000 7f70a2d7a640  0 [cephadm ERROR
> >>>> cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Error in
> HTTPServer.serve
> >>>> Traceback (most recent call last):
> >>>>   File "/lib/python3.9/site-packages/cheroot/server.py", line 1823,
> >>>> in serve
> >>>>     self._connections.run(self.expiration_interval)
> >>>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line
> >>>> 203, in run
> >>>>     self._run(expiration_interval)
> >>>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line
> >>>> 246, in _run
> >>>>     new_conn = self._from_server_socket(self.server.socket)
> >>>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line
> >>>> 300, in _from_server_socket
> >>>>     s, ssl_env = self.server.ssl_adapter.wrap(s)
> >>>>   File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line
> >>>> 277, in wrap
> >>>>     s = self.context.wrap_socket(
> >>>>   File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
> >>>>     return self.sslsocket_class._create(
> >>>>   File "/lib64/python3.9/ssl.py", line 1074, in _create
> >>>>     self.do_handshake()
> >>>>   File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
> >>>>     self._sslobj.do_handshake()
> >>>> ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF)
> >>>> (_ssl.c:1133)
> >>>> 2024-12-17T10:43:11.865+0000 7f70a2d7a640 -1 log_channel(cephadm)
> >>>> log [ERR] : [17/Dec/2024:10:43:11] ENGINE Error in HTTPServer.serve
> >>>> Traceback (most recent call last):
> >>>>   File "/lib/python3.9/site-packages/cheroot/server.py", line 1823,
> >>>> in serve
> >>>>     self._connections.run(self.expiration_interval)
> >>>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line
> >>>> 203, in run
> >>>>     self._run(expiration_interval)
> >>>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line
> >>>> 246, in _run
> >>>>     new_conn = self._from_server_socket(self.server.socket)
> >>>>   File "/lib/python3.9/site-packages/cheroot/connections.py", line
> >>>> 300, in _from_server_socket
> >>>>     s, ssl_env = self.server.ssl_adapter.wrap(s)
> >>>>   File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line
> >>>> 277, in wrap
> >>>>     s = self.context.wrap_socket(
> >>>>   File "/lib64/python3.9/ssl.py", line 501, in wrap_socket
> >>>>     return self.sslsocket_class._create(
> >>>>   File "/lib64/python3.9/ssl.py", line 1074, in _create
> >>>>     self.do_handshake()
> >>>>   File "/lib64/python3.9/ssl.py", line 1343, in do_handshake
> >>>>     self._sslobj.do_handshake()
> >>>> ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF)
> >>>> (_ssl.c:1133)
> >>>> 2024-12-17T10:43:11.963+0000 7f70ebaf6640  0 [cephadm INFO
> >>>> cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Serving on https://
> >>>> www.google.com/url?q=http://172.21.15.148:8765&source=gmail-
> >>>> imap&ust=1735039047000000&usg=AOvVaw1D05c8loKEwXnozNdlOMpU
> >>>> 2024-12-17T10:43:11.963+0000 7f70ebaf6640  0 log_channel(cephadm)
> >>>> log [INF] : [17/Dec/2024:10:43:11] ENGINE Serving on https://
> >>>> www.google.com/url?q=http://172.21.15.148:8765&source=gmail-
> >>>> imap&ust=1735039047000000&usg=AOvVaw1D05c8loKEwXnozNdlOMpU
> >>>> 2024-12-17T10:43:11.963+0000 7f70ebaf6640  0 [cephadm INFO
> >>>> cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Bus STARTED
> >>>> 2024-12-17T10:43:11.964+0000 7f70ebaf6640  0 log_channel(cephadm)
> >>>> log [INF] : [17/Dec/2024:10:43:11] ENGINE Bus STARTED
> >>>> ...
> >>>> "
> >>>> It will recover after some timeout, maybe 5-10 mins, and then just
> >>>> sit there with no upgrade progress.
> >>>> Nothing in mgr/cephadm/osd_remove_queue.
> >>>> Suggestions?
> >>>> Mvh.
> >>>> Torkil
> >>>
> >>> --
> >>> Torkil Svensgaard
> >>> Sysadmin
> >>> MR-Forskningssektionen, afs. 714
> >>> DRCMR, Danish Research Centre for Magnetic Resonance
> >>> Hvidovre Hospital
> >>> Kettegård Allé 30
> >>> DK-2650 Hvidovre
> >>> Denmark
> >>> Tel: +45 386 22828
> >>> E-mail: torkil@xxxxxxxx
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> --
> Torkil Svensgaard
> Sysadmin
> MR-Forskningssektionen, afs. 714
> DRCMR, Danish Research Centre for Magnetic Resonance
> Hvidovre Hospital
> Kettegård Allé 30
> DK-2650 Hvidovre
> Denmark
> Tel: +45 386 22828
> E-mail: torkil@xxxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux