Hi Jeff, Ran into the very same issue. Filed a bug-report at https://tracker.ceph.com/issues/45574 ceph 15.2.1 on up-to-date Debian Buster. TL;DR: The way ceph-mgr-rook's RookOrchestrator class interacts with python3-numpy package is borked. Result: Cluster cannot start, since 'devicehealth' plugin of ceph-mgr is an always-on module. Best, Martin On Mon, May 04, 2020 at 02:10:58AM -0700, Jeff Welling wrote: > Hello my ceph-using comrades! > > I've been using ceph for awhile at home but wanted to update to the > latest, Octopus. I got it installed on a single node, added a second > node and some OSDs, and have been migrating from my original Jewel > cluster. When I installed Octopus, I wiped the systems and installed > Debian Buster, added the ceph apt repos instead of using the packages in > Debian, and installed manually using ceph-vol to create Bluestore OSDs, > using the ceph docs as my guide. > > Now though, one of the two Octopus nodes (the one running ceph-mgr and > ceph-mon) are crashing weekly. I haven't been able to look into the > cause of the crashes in detail yet as these are hobbyist systems and > work has been exceptionally busy lately, but now after the most recent > crash, I'm unable to start ceph-mgr and the syslog has ceph-mgr messages > complaining of not being able to find the 'rook' module. This is rather > confusing because though I'm aware of rook, to my knowledge I've never > used it on my systems, and there's no mention of it in my config. > > I tried applying pending upgrades but that hasn't changed the behavior. > > I normally wouldn't dare ask for help this early in my adventure but I > find myself in a bit of a pinch. By any chance have you hit this before, > or know what may be causing it? > > > Ceph is awesome. Keep up the good work, stay safe, and Thank You Kindly > in advance! > > > > My ceph version > > root@zim:~# ceph --version > ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) > octopus (stable) > > > This is my ceph.config > > [global] > fsid = 495d7f30-CCCC-BBBB-AAAA-ddf6ffe063d0 > mon initial members = zim.internal.justdev.ca > mon host = 192.168.0.11 > public network = 192.168.0.0/24 > cluster network = 192.168.42.0/24 > auth cluster required = cephx > auth service required = cephx > auth client required = cephx > osd journal size = 1024 > osd pool default size = 3 > osd pool default min size = 2 > osd pool default pg num = 333 > osd pool default pgp num = 333 > osd crush chooseleaf type = 1 > rbd_default_features = 7 > > > These are the syslogs that show up when trying to restart ceph-mgr > > May 4 01:35:13 zim ceph-mgr[21602]: 2020-05-04T01:35:13.065-0700 > 7fcdccaa2f40 -1 mgr[py] Module not found: 'rook' > May 4 01:35:13 zim ceph-mgr[21602]: 2020-05-04T01:35:13.065-0700 > 7fcdccaa2f40 -1 mgr[py] Traceback (most recent call last): > May 4 01:35:13 zim ceph-mgr[21602]: File > "/usr/share/ceph/mgr/rook/__init__.py", line 2, in <module> > May 4 01:35:13 zim ceph-mgr[21602]: from .module import > RookOrchestrator > May 4 01:35:13 zim ceph-mgr[21602]: File > "/usr/share/ceph/mgr/rook/module.py", line 16, in <module> > May 4 01:35:13 zim ceph-mgr[21602]: from kubernetes import > client, config > May 4 01:35:13 zim ceph-mgr[21602]: File > "/lib/python3/dist-packages/kubernetes/__init__.py", line 22, in > <module> > May 4 01:35:13 zim ceph-mgr[21602]: import kubernetes.stream > May 4 01:35:13 zim ceph-mgr[21602]: File > "/lib/python3/dist-packages/kubernetes/stream/__init__.py", line 15, > in <module> > May 4 01:35:13 zim ceph-mgr[21602]: from .stream import stream > May 4 01:35:13 zim ceph-mgr[21602]: File > "/lib/python3/dist-packages/kubernetes/stream/stream.py", line 13, > in <module> > May 4 01:35:13 zim ceph-mgr[21602]: from . import ws_client > May 4 01:35:13 zim ceph-mgr[21602]: File > "/lib/python3/dist-packages/kubernetes/stream/ws_client.py", line > 19, in <module> > May 4 01:35:13 zim ceph-mgr[21602]: from websocket import > WebSocket, ABNF, enableTrace > May 4 01:35:13 zim ceph-mgr[21602]: File > "/lib/python3/dist-packages/websocket/__init__.py", line 22, in <module> > May 4 01:35:13 zim ceph-mgr[21602]: from ._abnf import * > May 4 01:35:13 zim ceph-mgr[21602]: File > "/lib/python3/dist-packages/websocket/_abnf.py", line 34, in <module> > May 4 01:35:13 zim ceph-mgr[21602]: import numpy > May 4 01:35:13 zim ceph-mgr[21602]: File > "/lib/python3/dist-packages/numpy/__init__.py", line 142, in <module> > May 4 01:35:13 zim ceph-mgr[21602]: from . import core > May 4 01:35:13 zim ceph-mgr[21602]: File > "/lib/python3/dist-packages/numpy/core/__init__.py", line 40, in > <module> > May 4 01:35:13 zim ceph-mgr[21602]: from . import multiarray > May 4 01:35:13 zim ceph-mgr[21602]: File > "/lib/python3/dist-packages/numpy/core/multiarray.py", line 12, in > <module> > May 4 01:35:13 zim ceph-mgr[21602]: from . import overrides > May 4 01:35:13 zim ceph-mgr[21602]: File > "/lib/python3/dist-packages/numpy/core/overrides.py", line 65, in > <module> > May 4 01:35:13 zim ceph-mgr[21602]: """) > May 4 01:35:13 zim ceph-mgr[21602]: RuntimeError: > _get_implementing_args method already has a docstring > May 4 01:35:13 zim ceph-mgr[21602]: 2020-05-04T01:35:13.069-0700 > 7fcdccaa2f40 -1 mgr[py] Class not found in module 'rook' > May 4 01:35:13 zim ceph-mgr[21602]: 2020-05-04T01:35:13.069-0700 > 7fcdccaa2f40 -1 mgr[py] Error loading module 'rook': (2) No such > file or directory > May 4 01:35:13 zim ceph-mgr[21602]: 2020-05-04T01:35:13.673-0700 > 7fcdccaa2f40 -1 log_channel(cluster) log [ERR] : Failed to load > ceph-mgr modules: rook > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx