Anytime. Just adding some weight here: fully agree, users upgrading to 19.2.0 should be aware of possible balancer issues. Best, Laimis J. On Tue, Dec 17, 2024, 13:54 Torkil Svensgaard <torkil@xxxxxxxx> wrote: > Thanks guys, turning off the balancer seems to have fixed it. > > Mvh. > > Torkil > > On 17/12/2024 12:40, Eugen Block wrote: > > I know 19.2.1 is already in the validation phase, but it would make > > sense (to me) to add this to the upgrade notes for Squid (https:// > > docs.ceph.com/en/latest/releases/squid/#v19-2-0-squid) until the fix > has > > been released. Similar to the note about ISCSI users. Adding Zac here > > directly. > > > > Zitat von Laimis Juzeliūnas <laimis.juzeliunas@xxxxxxxxxx>: > > > >> Hi Torkil, > >> > >> Possible that you are hitting balancer issues on 19.2.0 for clusters > >> with larger pg numbers: https://tracker.ceph.com/issues/68657 > >> Try turning it off with ceph balancer off > >> > >> Best, > >> Laimis J. > >> > >>> On 17 Dec 2024, at 13:15, Torkil Svensgaard <torkil@xxxxxxxx> wrote: > >>> > >>> > >>> > >>> On 17/12/2024 12:05, Torkil Svensgaard wrote: > >>>> Hi > >>>> Running upgrade from 18.2.4 to 19.2.0 and it managed to upgrade the > >>>> managers but no further progress. > >>> > >>> Now it actually seems to have upgraded 1 MON now then the > >>> orchestrator crashed again: > >>> > >>> " > >>> { > >>> "mon": { > >>> "ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) > >>> reef (stable)": 4, > >>> "ceph version 19.2.0 > >>> (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)": 1 > >>> }, > >>> "mgr": { > >>> "ceph version 19.2.0 > >>> (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)": 3 > >>> }, > >>> "osd": { > >>> "ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) > >>> reef (stable)": 548 > >>> }, > >>> "mds": { > >>> "ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) > >>> reef (stable)": 3 > >>> }, > >>> "overall": { > >>> "ceph version 18.2.4 > >>> (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)": 555, > >>> "ceph version 19.2.0 > >>> (16063ff2022298c9300e49a547a16ffda59baf13) squid (stable)": 4 > >>> } > >>> } > >>> " > >>> > >>> Mvh. > >>> > >>> Torkil > >>> > >>> > >>>> If I fail over the mgr it goes: > >>>> " > >>>> [root@ceph-flash1 ~]# ceph orch upgrade status > >>>> Error ENOTSUP: Module 'orchestrator' is not enabled/loaded (required > >>>> by command 'orch upgrade status'): use `ceph mgr module enable > >>>> orchestrator` to enable it > >>>> " > >>>> From mgr log: > >>>> " > >>>> ... > >>>> 2024-12-17T10:43:11.729+0000 7f70efafe640 0 log_channel(audit) log > >>>> [DBG] : from='client.2110010386 -' entity='client.admin' > >>>> cmd=[{"prefix": "orch upgrade status", "target": ["mon-mgr", ""]}]: > >>>> dispatch > >>>> 2024-12-17T10:43:11.733+0000 7f70ebaf6640 0 [cephadm INFO > >>>> cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Bus STARTING > >>>> 2024-12-17T10:43:11.733+0000 7f70ebaf6640 0 log_channel(cephadm) > >>>> log [INF] : [17/Dec/2024:10:43:11] ENGINE Bus STARTING > >>>> 2024-12-17T10:43:11.811+0000 7f70e7aee640 0 [dashboard INFO > >>>> dashboard.module] Engine started. > >>>> 2024-12-17T10:43:11.861+0000 7f70ebaf6640 0 [cephadm INFO > >>>> cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Serving on https:// > >>>> www.google.com/url?q=https://172.21.15.148:7150&source=gmail- > >>>> imap&ust=1735039047000000&usg=AOvVaw3LyWY24vMZA-AbVVOsv3Z9 > >>>> 2024-12-17T10:43:11.861+0000 7f70ebaf6640 0 log_channel(cephadm) > >>>> log [INF] : [17/Dec/2024:10:43:11] ENGINE Serving on https:// > >>>> www.google.com/url?q=https://172.21.15.148:7150&source=gmail- > >>>> imap&ust=1735039047000000&usg=AOvVaw3LyWY24vMZA-AbVVOsv3Z9 > >>>> 2024-12-17T10:43:11.864+0000 7f70a2d7a640 0 [cephadm ERROR > >>>> cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Error in > HTTPServer.serve > >>>> Traceback (most recent call last): > >>>> File "/lib/python3.9/site-packages/cheroot/server.py", line 1823, > >>>> in serve > >>>> self._connections.run(self.expiration_interval) > >>>> File "/lib/python3.9/site-packages/cheroot/connections.py", line > >>>> 203, in run > >>>> self._run(expiration_interval) > >>>> File "/lib/python3.9/site-packages/cheroot/connections.py", line > >>>> 246, in _run > >>>> new_conn = self._from_server_socket(self.server.socket) > >>>> File "/lib/python3.9/site-packages/cheroot/connections.py", line > >>>> 300, in _from_server_socket > >>>> s, ssl_env = self.server.ssl_adapter.wrap(s) > >>>> File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line > >>>> 277, in wrap > >>>> s = self.context.wrap_socket( > >>>> File "/lib64/python3.9/ssl.py", line 501, in wrap_socket > >>>> return self.sslsocket_class._create( > >>>> File "/lib64/python3.9/ssl.py", line 1074, in _create > >>>> self.do_handshake() > >>>> File "/lib64/python3.9/ssl.py", line 1343, in do_handshake > >>>> self._sslobj.do_handshake() > >>>> ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) > >>>> (_ssl.c:1133) > >>>> 2024-12-17T10:43:11.865+0000 7f70a2d7a640 -1 log_channel(cephadm) > >>>> log [ERR] : [17/Dec/2024:10:43:11] ENGINE Error in HTTPServer.serve > >>>> Traceback (most recent call last): > >>>> File "/lib/python3.9/site-packages/cheroot/server.py", line 1823, > >>>> in serve > >>>> self._connections.run(self.expiration_interval) > >>>> File "/lib/python3.9/site-packages/cheroot/connections.py", line > >>>> 203, in run > >>>> self._run(expiration_interval) > >>>> File "/lib/python3.9/site-packages/cheroot/connections.py", line > >>>> 246, in _run > >>>> new_conn = self._from_server_socket(self.server.socket) > >>>> File "/lib/python3.9/site-packages/cheroot/connections.py", line > >>>> 300, in _from_server_socket > >>>> s, ssl_env = self.server.ssl_adapter.wrap(s) > >>>> File "/lib/python3.9/site-packages/cheroot/ssl/builtin.py", line > >>>> 277, in wrap > >>>> s = self.context.wrap_socket( > >>>> File "/lib64/python3.9/ssl.py", line 501, in wrap_socket > >>>> return self.sslsocket_class._create( > >>>> File "/lib64/python3.9/ssl.py", line 1074, in _create > >>>> self.do_handshake() > >>>> File "/lib64/python3.9/ssl.py", line 1343, in do_handshake > >>>> self._sslobj.do_handshake() > >>>> ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) > >>>> (_ssl.c:1133) > >>>> 2024-12-17T10:43:11.963+0000 7f70ebaf6640 0 [cephadm INFO > >>>> cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Serving on https:// > >>>> www.google.com/url?q=http://172.21.15.148:8765&source=gmail- > >>>> imap&ust=1735039047000000&usg=AOvVaw1D05c8loKEwXnozNdlOMpU > >>>> 2024-12-17T10:43:11.963+0000 7f70ebaf6640 0 log_channel(cephadm) > >>>> log [INF] : [17/Dec/2024:10:43:11] ENGINE Serving on https:// > >>>> www.google.com/url?q=http://172.21.15.148:8765&source=gmail- > >>>> imap&ust=1735039047000000&usg=AOvVaw1D05c8loKEwXnozNdlOMpU > >>>> 2024-12-17T10:43:11.963+0000 7f70ebaf6640 0 [cephadm INFO > >>>> cherrypy.error] [17/Dec/2024:10:43:11] ENGINE Bus STARTED > >>>> 2024-12-17T10:43:11.964+0000 7f70ebaf6640 0 log_channel(cephadm) > >>>> log [INF] : [17/Dec/2024:10:43:11] ENGINE Bus STARTED > >>>> ... > >>>> " > >>>> It will recover after some timeout, maybe 5-10 mins, and then just > >>>> sit there with no upgrade progress. > >>>> Nothing in mgr/cephadm/osd_remove_queue. > >>>> Suggestions? > >>>> Mvh. > >>>> Torkil > >>> > >>> -- > >>> Torkil Svensgaard > >>> Sysadmin > >>> MR-Forskningssektionen, afs. 714 > >>> DRCMR, Danish Research Centre for Magnetic Resonance > >>> Hvidovre Hospital > >>> Kettegård Allé 30 > >>> DK-2650 Hvidovre > >>> Denmark > >>> Tel: +45 386 22828 > >>> E-mail: torkil@xxxxxxxx > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > -- > Torkil Svensgaard > Sysadmin > MR-Forskningssektionen, afs. 714 > DRCMR, Danish Research Centre for Magnetic Resonance > Hvidovre Hospital > Kettegård Allé 30 > DK-2650 Hvidovre > Denmark > Tel: +45 386 22828 > E-mail: torkil@xxxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx