On 1/17/24 20:49, Chris Palmer wrote: > > > On 17/01/2024 16:11, kefu chai wrote: >> >> >> On Tue, Jan 16, 2024 at 12:11 AM Chris Palmer <chris.palmer@xxxxxxxxx> wrote: >> >> Updates on both problems: >> >> Problem 1 >> -------------- >> >> The bookworm/reef cephadm package needs updating to accommodate >> the last >> change in /usr/share/doc/adduser/NEWS.Debian.gz: >> >> System user home defaults to /nonexistent if --home is not >> specified. >> Packages that call adduser to create system accounts should >> explicitly >> specify a location for /home (see Lintian check >> maintainer-script-lacks-home-in-adduser). >> >> i.e. when creating the cephadm user as a system user it needs to >> explicitly specify the expected home directory of /home/cephadm. >> >> >> Hi Chris, thank you for the bug report and the suggestion. could you please >> file a tracker ticket, so we can track and backport the related fixes? i just >> created https://github.com/ceph/ceph/pull/55218 in hope to alleviate the >> problem. > > I've created issue https://tracker.ceph.com/issues/64069 for this. > >> >> A workaround is to manually create the user+directory before >> installing >> ceph. >> >> >> Problem 2 >> -------------- >> >> This is a complex set of interactions that prevent many mgr modules >> (including dashboard) from running. It is NOT debian-specific and >> will >> eventually bite other distributions as well. At the moment Ceph >> PR54710 >> looks the most promising fix (full or partial). Detail is spread >> across >> the following: >> >> https://github.com/pyca/cryptography/issues/9016 >> https://github.com/ceph/ceph/pull/54710 >> https://tracker.ceph.com/issues/63529 >> https://forum.proxmox.com/threads/ceph-warning-post-upgrade-to-v8.129371/page-5 >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1055212 >> https://github.com/pyca/bcrypt/issues/694 >> >> >> IIUC, a backport of https://github.com/ceph/ceph/pull/54710 to reef would address this issue, am i right? >> >> > > Unfortunately I think this may be part of a much bigger MGR problem. My understanding of the relevant background is: > > * MGR modules use python subinterpreters for isolation between modules. > * Several modules (including but not limited to dashboard & restful) > use python3-cryptography for hashing and TLS (and possibly other > things). > * python3-cryptography delegates some crypto functions to Rust > functions. These include bcrypt and TLS-related functions. > * python3-cryptography uses PyO3 to invoke Rust functions. > * PyO3 does not support being used by subinterpreters. In the past > this has been allowed but was actually unsafe. Now PyO3 throws an > exception when it detects multiple initialisations. > > So it appears that the MGR use of these functions has always been unsafe, and is now forbidden. > > PR54710 identified that the code necessary for the bcrypt hashing used during authentication could easily be written in a small amount of native python, thus avoiding the whole PyO3 area altogether. > However there was a note in the discussions that you also had to disable TLS. And it only applied to the dashboard. My stacktrace below shows the exception during TLS initialisation. > > As PyO3 updates are adopted in other linux distributions this is likely to break a number of MGR modules. As there does not seem to be any subinterpreter support in PyO3 coming soon, the only option > may be to completely eliminate use of python3-cryptopgraphy from all MGR modules. (It is possible MGR modules may also use other python3 modules that use PyO3 to invoke Rust). > > Unfortunately for us, we didn't find this until we had upgraded all MONs in a cluster to reef, at which point we can't downgrade them to quincy. And we can't upgrade the MGR. As a temporary measure > (this cluster had MON/MGR/MDS/RGW colocated on 2 hosts) we have added another bookworm host running a reef MON to ensure we can maintain quorum. We are not sure whether it is safe to upgrade the other > components (OSD, MDS, RGW) while the MGR remains at quincy, > > 🙁 Hi there, glad to see that this is getting some more attention. I'm the one that submitted that one bug regarding PyO3 and Ceph MGR [0] a while ago. Everything you've mentioned is correct - Ceph is using a rare sub-interpreter model for the MGR in order to juggle all the different MGR modules. Theoretically, it should be possible to start a thread with one interpreter for each module instead, but that would definitely be anything but a trivial rewrite on Ceph's side. Side note: If anyone here is reading this, wishing to contribute to PyO3 and help implementing sub-interpreter support, you can join me over on GitHub, where I've created a tracking issue for this problem some time ago. [1] I'm now finally able to allocate more time for this again, so I will hopefully be able to make some progress there. [0]: https://tracker.ceph.com/issues/63529 [1]: https://github.com/PyO3/pyo3/issues/3451 > >> >> >> >> On 12/01/2024 14:29, Chris Palmer wrote: >> > More info on problem 2: >> > >> > When starting the dashboard, the mgr seems to try to initialise >> > cephadm, which in turn uses python crypto libraries that lead to >> the >> > python error: >> > >> > $ ceph crash info >> > 2024-01-12T11:10:03.938478Z_2263d2c8-8120-417e-84bc-bb01f5d81e52 >> > { >> > "backtrace": [ >> > " File \"/usr/share/ceph/mgr/cephadm/__init__.py\", >> line 1, >> > in <module>\n from .module import CephadmOrchestrator", >> > " File \"/usr/share/ceph/mgr/cephadm/module.py\", line >> 15, in >> > <module>\n from cephadm.service_discovery import >> ServiceDiscovery", >> > " File >> \"/usr/share/ceph/mgr/cephadm/service_discovery.py\", >> > line 20, in <module>\n from cephadm.ssl_cert_utils import >> SSLCerts", >> > " File \"/usr/share/ceph/mgr/cephadm/ssl_cert_utils.py\", >> > line 8, in <module>\n from cryptography import x509", >> > " File >> > \"/lib/python3/dist-packages/cryptography/x509/__init__.py\", >> line 6, >> > in <module>\n from cryptography.x509 import >> certificate_transparency", >> > " File >> > >> \"/lib/python3/dist-packages/cryptography/x509/certificate_transparency.py\", >> >> > line 10, in <module>\n from cryptography.hazmat.bindings._rust >> > import x509 as rust_x509", >> > "ImportError: PyO3 modules may only be initialized once per >> > interpreter process" >> > ], >> > "ceph_version": "18.2.1", >> > "crash_id": >> > "2024-01-12T11:10:03.938478Z_2263d2c8-8120-417e-84bc-bb01f5d81e52", >> > "entity_name": "mgr.xxxxx01", >> > "mgr_module": "cephadm", >> > "mgr_module_caller": "PyModule::load_subclass_of", >> > "mgr_python_exception": "ImportError", >> > "os_id": "12", >> > "os_name": "Debian GNU/Linux 12 (bookworm)", >> > "os_version": "12 (bookworm)", >> > "os_version_id": "12", >> > "process_name": "ceph-mgr", >> > "stack_sig": >> > "7815ad73ced094695056319d1241bf7847da19b4b0dfee7a216407b59a7e3d84", >> > "timestamp": "2024-01-12T11:10:03.938478Z", >> > "utsname_hostname": "xxxxx01.xxx.xxx", >> > "utsname_machine": "x86_64", >> > "utsname_release": "6.1.0-17-amd64", >> > "utsname_sysname": "Linux", >> > "utsname_version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 >> > (2023-12-30)" >> > } >> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx