Re: Debian 12 (bookworm) / Reef 18.2.1 problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/17/24 20:49, Chris Palmer wrote:
> 
> 
> On 17/01/2024 16:11, kefu chai wrote:
>>
>>
>> On Tue, Jan 16, 2024 at 12:11 AM Chris Palmer <chris.palmer@xxxxxxxxx> wrote:
>>
>>     Updates on both problems:
>>
>>     Problem 1
>>     --------------
>>
>>     The bookworm/reef cephadm package needs updating to accommodate
>>     the last
>>     change in /usr/share/doc/adduser/NEWS.Debian.gz:
>>
>>        System user home defaults to /nonexistent if --home is not
>>     specified.
>>        Packages that call adduser to create system accounts should
>>     explicitly
>>        specify a location for /home (see Lintian check
>>        maintainer-script-lacks-home-in-adduser).
>>
>>     i.e. when creating the cephadm user as a system user it needs to
>>     explicitly specify the expected home directory of /home/cephadm.
>>
>>
>> Hi Chris, thank you for the bug report and the suggestion. could you please
>> file a tracker ticket, so we can track and backport the related fixes? i just
>> created https://github.com/ceph/ceph/pull/55218 in hope to alleviate the
>> problem.
> 
> I've created issue https://tracker.ceph.com/issues/64069 for this.
> 
>>
>>     A workaround is to manually create the user+directory before
>>     installing
>>     ceph.
>>
>>
>>     Problem 2
>>     --------------
>>
>>     This is a complex set of interactions that prevent many mgr modules
>>     (including dashboard) from running. It is NOT debian-specific and
>>     will
>>     eventually bite other distributions as well. At the moment Ceph
>>     PR54710
>>     looks the most promising fix (full or partial). Detail is spread
>>     across
>>     the following:
>>
>>     https://github.com/pyca/cryptography/issues/9016
>>     https://github.com/ceph/ceph/pull/54710
>>     https://tracker.ceph.com/issues/63529
>>     https://forum.proxmox.com/threads/ceph-warning-post-upgrade-to-v8.129371/page-5
>>     https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1055212
>>     https://github.com/pyca/bcrypt/issues/694
>>
>>
>> IIUC, a backport of https://github.com/ceph/ceph/pull/54710 to reef would address this issue, am i right?
>>
>>
> 
> Unfortunately I think this may be part of a much bigger MGR problem. My understanding of the relevant background is:
> 
>  * MGR modules use python subinterpreters for isolation between modules.
>  * Several modules (including but not limited to dashboard & restful)
>    use python3-cryptography for hashing and TLS (and possibly other
>    things).
>  * python3-cryptography delegates some crypto functions to Rust
>    functions. These include bcrypt and TLS-related functions.
>  * python3-cryptography uses PyO3 to invoke Rust functions.
>  * PyO3 does not support being used by subinterpreters. In the past
>    this has been allowed but was actually unsafe. Now PyO3 throws an
>    exception when it detects multiple initialisations.
> 
> So it appears that the MGR use of these functions has always been unsafe, and is now forbidden.
> 
> PR54710 identified that the code necessary for the bcrypt hashing used during authentication could easily be written in a small amount of native python, thus avoiding the whole PyO3 area altogether.
> However there was a note in the discussions that you also had to disable TLS. And it only applied to the dashboard. My stacktrace below shows the exception during TLS initialisation.
> 
> As PyO3 updates are adopted in other linux distributions this is likely to break a number of MGR modules. As there does not seem to be any subinterpreter support in PyO3 coming soon, the only option
> may be to completely eliminate use of python3-cryptopgraphy from all MGR modules. (It is possible MGR modules may also use other python3 modules that use PyO3 to invoke Rust).
> 
> Unfortunately for us, we didn't find this until we had upgraded all MONs in a cluster to reef, at which point we can't downgrade them to quincy. And we can't upgrade the MGR. As a temporary measure
> (this cluster had MON/MGR/MDS/RGW colocated on 2 hosts) we have added another bookworm host running a reef MON to ensure we can maintain quorum. We are not sure whether it is safe to upgrade the other
> components (OSD, MDS, RGW) while the MGR remains at quincy,
> 
> 🙁

Hi there,
glad to see that this is getting some more attention. I'm the one that submitted
that one bug regarding PyO3 and Ceph MGR [0] a while ago.

Everything you've mentioned is correct - Ceph is using a rare sub-interpreter
model for the MGR in order to juggle all the different MGR modules. Theoretically,
it should be possible to start a thread with one interpreter for each module
instead, but that would definitely be anything but a trivial rewrite on Ceph's side.

Side note: If anyone here is reading this, wishing to contribute to PyO3 and
help implementing sub-interpreter support, you can join me over on GitHub, where
I've created a tracking issue for this problem some time ago. [1] I'm now finally
able to allocate more time for this again, so I will hopefully be able to make
some progress there.

[0]: https://tracker.ceph.com/issues/63529
[1]: https://github.com/PyO3/pyo3/issues/3451

> 
>>
>>
>>
>>     On 12/01/2024 14:29, Chris Palmer wrote:
>>     > More info on problem 2:
>>     >
>>     > When starting the dashboard, the mgr seems to try to initialise
>>     > cephadm, which in turn uses python crypto libraries that lead to
>>     the
>>     > python error:
>>     >
>>     > $ ceph crash info
>>     > 2024-01-12T11:10:03.938478Z_2263d2c8-8120-417e-84bc-bb01f5d81e52
>>     > {
>>     >     "backtrace": [
>>     >         "  File \"/usr/share/ceph/mgr/cephadm/__init__.py\",
>>     line 1,
>>     > in <module>\n    from .module import CephadmOrchestrator",
>>     >         "  File \"/usr/share/ceph/mgr/cephadm/module.py\", line
>>     15, in
>>     > <module>\n    from cephadm.service_discovery import
>>     ServiceDiscovery",
>>     >         "  File
>>     \"/usr/share/ceph/mgr/cephadm/service_discovery.py\",
>>     > line 20, in <module>\n    from cephadm.ssl_cert_utils import
>>     SSLCerts",
>>     >         "  File \"/usr/share/ceph/mgr/cephadm/ssl_cert_utils.py\",
>>     > line 8, in <module>\n    from cryptography import x509",
>>     >         "  File
>>     > \"/lib/python3/dist-packages/cryptography/x509/__init__.py\",
>>     line 6,
>>     > in <module>\n    from cryptography.x509 import
>>     certificate_transparency",
>>     >         "  File
>>     >
>>     \"/lib/python3/dist-packages/cryptography/x509/certificate_transparency.py\",
>>
>>     > line 10, in <module>\n    from cryptography.hazmat.bindings._rust
>>     > import x509 as rust_x509",
>>     >         "ImportError: PyO3 modules may only be initialized once per
>>     > interpreter process"
>>     >     ],
>>     >     "ceph_version": "18.2.1",
>>     >     "crash_id":
>>     > "2024-01-12T11:10:03.938478Z_2263d2c8-8120-417e-84bc-bb01f5d81e52",
>>     >     "entity_name": "mgr.xxxxx01",
>>     >     "mgr_module": "cephadm",
>>     >     "mgr_module_caller": "PyModule::load_subclass_of",
>>     >     "mgr_python_exception": "ImportError",
>>     >     "os_id": "12",
>>     >     "os_name": "Debian GNU/Linux 12 (bookworm)",
>>     >     "os_version": "12 (bookworm)",
>>     >     "os_version_id": "12",
>>     >     "process_name": "ceph-mgr",
>>     >     "stack_sig":
>>     > "7815ad73ced094695056319d1241bf7847da19b4b0dfee7a216407b59a7e3d84",
>>     >     "timestamp": "2024-01-12T11:10:03.938478Z",
>>     >     "utsname_hostname": "xxxxx01.xxx.xxx",
>>     >     "utsname_machine": "x86_64",
>>     >     "utsname_release": "6.1.0-17-amd64",
>>     >     "utsname_sysname": "Linux",
>>     >     "utsname_version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1
>>     > (2023-12-30)"
>>     > }
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux