Hi, How I got here -------------- Yesterday evening I added an OSD to my hobby system most likely using the command: # ceph-volume raw prepare --bluestore --data /dev/bcache0 # cephadm adopt --style legacy --name osd.20 I also used the command (after not having much luck with that, but I don't have the specifics): % ceph orch daemon add osd tutu:/tmp/bcache0 per https://docs.ceph.com/en/latest/cephadm/osd/#creating-new-osds ..which I think resulted in new osd.18, putting the bcache0 inside its own VG and its own LV. I don't have actual log of the used command available, but I did end up with new osds 18 and 20. First time using these command as well, my previous ways to achieve the same were a bit more long-winded.. According to my monitoring my main issue appeared around the same time. In this post I don't worry about the state of the OSD but only about management. Actual issue ------------ So when I now issue "ceph orch ls" I get the following output: % ceph orch ls Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1204, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in handle_command return dispatch[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in <lambda> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in wrapper return func(*args, **kwargs) File "/usr/share/ceph/mgr/orchestrator/module.py", line 503, in _list_services raise_if_exception(completion) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in raise_if_exception raise e AssertionError: not ("ceph orch ps" works fine.) Similarly the output of "ceph -s" is: % ceph -s ... health: HEALTH_ERR Module 'cephadm' has failed: 'not' ... The relevant log from the manager, as per the mgr web interface, is: _Promise failed Traceback (most recent call last): File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 294, in _finalize next_result = self._on_complete(self._value) File "/usr/share/ceph/mgr/cephadm/module.py", line 107, in <lambda> return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs)) File "/usr/share/ceph/mgr/cephadm/module.py", line 1333, in describe_service hosts=[dd.hostname] File "/lib/python3.6/site-packages/ceph/deployment/service_spec.py", line 429, in __init__ assert service_type in ServiceSpec.KNOWN_SERVICE_TYPES, service_type AssertionError: not I also noticed this seemingly highly relevant bit in my ceph orch ps: NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID not.osd.20 tutu stopped 13h ago 14h <unknown> docker.io/ceph/ceph:v15 <unknown> <unknown> I'm not quite sure how I ended up with that, but I wouldn't exclude operator error :) such as entering "cephadm adopt --style legacy --name not.osd.20" (but WHY..). Sure enough, there is no such docker container running in the host and the job ceph-3046312a-e453-11ea-b1f5-b42e993e47fc@osd.20.service has failed with "RuntimeError: could not find osd.20 with osd_fsid 212c336a-9516-4818-aeaf-2d0c24c4ca65" (this error makes sense, as both osds 18 and 20 try to use the same bcache0, but the actual bluestore filesystem is inside vg/lv as used by 18, whereas 20 tries to use bcache0 directly), but as I said I won't worry about the OSD at the moment. I tried the command "ceph orch daemon rm not.osd.20", however I'm not sure if it even should work. It nevertheless fails the same way: % ceph orch daemon rm not.osd.20 Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1204, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in handle_command return dispatch[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 320, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 102, in <lambda> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 91, in wrapper return func(*args, **kwargs) File "/usr/share/ceph/mgr/orchestrator/module.py", line 1061, in _daemon_rm raise_if_exception(completion) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 642, in raise_if_exception raise e KeyError: 'not' with the following entries in the mgr log: 5/13/21 1:26:06 PM[ERR]_Promise failed Traceback (most recent call last): File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 294, in _finalize next_result = self._on_complete(self._value) File "/usr/share/ceph/mgr/cephadm/module.py", line 107, in <lambda> return CephadmCompletion(on_complete=lambda _: f(*args, **kwargs)) File "/usr/share/ceph/mgr/cephadm/module.py", line 1515, in remove_daemons return self._remove_daemons(args) File "/usr/share/ceph/mgr/cephadm/utils.py", line 65, in forall_hosts_wrapper return CephadmOrchestrator.instance._worker_pool.map(do_work, vals) File "/lib64/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/lib64/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/lib64/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/usr/share/ceph/mgr/cephadm/utils.py", line 58, in do_work return f(self, *arg) File "/usr/share/ceph/mgr/cephadm/module.py", line 1804, in _remove_daemons return self._remove_daemon(name, host) File "/usr/share/ceph/mgr/cephadm/module.py", line 1818, in _remove_daemon self.cephadm_services[daemon_type].pre_remove(daemon) KeyError: 'not' 5/13/21 1:26:06 PM[ERR]executing _remove_daemons((<cephadm.module.CephadmOrchestrator object at 0x7f1f4fec2bd0>, [('not.osd.20', 'tutu')])) failed. Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/utils.py", line 58, in do_work return f(self, *arg) File "/usr/share/ceph/mgr/cephadm/module.py", line 1804, in _remove_daemons return self._remove_daemon(name, host) File "/usr/share/ceph/mgr/cephadm/module.py", line 1818, in _remove_daemon self.cephadm_services[daemon_type].pre_remove(daemon) KeyError: 'not' I tried also that "ceph orch daemon rm foo.bar.42" gives the error "Error EINVAL: Unable to find daemon(s) ['foo.bar.42']", so it seems it processes the actual command fine in part. Thanks for any assistance! -- _____________________________________________________________________ / __// /__ ____ __ Erkki Seppälä\ \ / /_ / // // /\ \/ / \ / /_/ /_/ \___/ /_/\_\@inside.org http://www.inside.org/~flux/ _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx