I am still struggling with this cephadm issue, does anyone have an idea? I double checked and python3 is available on all nodes: $ which python3 /usr/bin/python3 $ python3 --version Python 3.8.10 How can I fix that? and how is it possible that rebooting my nodes breaks the cephadm orchestrator? ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, July 6th, 2021 at 8:09 AM, mabi <mabi@xxxxxxxxxxxxx> wrote: > Hello, > > After having done a rolling reboot of my Octopus 15.2.13 cluster of 8 nodes cephadm does not find python3 on the node and hence I get quite a few of the following warnings: > > [WRN] CEPHADM_HOST_CHECK_FAILED: 7 hosts fail cephadm check > > host ceph1f failed check: Can't communicate with remote host `ceph1f`, possibly because python3 is not installed there: [Errno 32] Broken pipe > > Here is the full stack trace from cephadm: > > 2021-07-06T06:03:20.798410+0000 mgr.ceph1a.xxqpph [ERR] Failed to apply osd.all-available-devices spec DriveGroupSpec(name=all-available-devices->placement=PlacementSpec(host_pattern='*'), service_id='all-available-devices', service_type='osd', data_devices=DeviceSelection(all=True), osd_id_claims={}, unmanaged=False, filter_logic='AND', preview_only=False): Can't communicate with remote host `ceph1d`, possibly because python3 is not installed there: [Errno 32] Broken pipe > > Traceback (most recent call last): > > File "/usr/share/ceph/mgr/cephadm/module.py", line 1015, in _remote_connection > > conn, connr = self._get_connection(addr) > > File "/usr/share/ceph/mgr/cephadm/module.py", line 978, in _get_connection > > sudo=True if self.ssh_user != 'root' else False) > > File "/lib/python3.6/site-packages/remoto/backends/init.py", line 34, in init > > self.gateway = self._make_gateway(hostname) > > File "/lib/python3.6/site-packages/remoto/backends/init.py", line 44, in _make_gateway > > self._make_connection_string(hostname) > > File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in makegateway > > gw = gateway_bootstrap.bootstrap(io, spec) > > File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102, in bootstrap > > bootstrap_exec(io, spec) > > File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 46, in bootstrap_exec > > "serve(io, id='%s-slave')" % spec.id, > > File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 78, in sendexec > > io.write((repr(source) + "\n").encode("ascii")) > > File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 409, in write > > self._write(data) > > BrokenPipeError: [Errno 32] Broken pipe > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > > File "/usr/share/ceph/mgr/cephadm/module.py", line 1019, in _remote_connection > > raise execnet.gateway_bootstrap.HostNotFound(msg) > > execnet.gateway_bootstrap.HostNotFound: Can't communicate with remote host `ceph1d`, possibly because python3 is not installed there: [Errno 32] Broken pipe > > The above exception was the direct cause of the following exception: > > Traceback (most recent call last): > > File "/usr/share/ceph/mgr/cephadm/serve.py", line 412, in _apply_all_services > > if self._apply_service(spec): > > File "/usr/share/ceph/mgr/cephadm/serve.py", line 450, in _apply_service > > self.mgr.osd_service.create_from_spec(cast(DriveGroupSpec, spec)) > > File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 51, in create_from_spec > > ret = create_from_spec_one(self.prepare_drivegroup(drive_group)) > > File "/usr/share/ceph/mgr/cephadm/utils.py", line 65, in forall_hosts_wrapper > > return CephadmOrchestrator.instance._worker_pool.map(do_work, vals) > > File "/lib64/python3.6/multiprocessing/pool.py", line 266, in map > > return self._map_async(func, iterable, mapstar, chunksize).get() > > File "/lib64/python3.6/multiprocessing/pool.py", line 644, in get > > raise self._value > > File "/lib64/python3.6/multiprocessing/pool.py", line 119, in worker > > result = (True, func(*args, **kwds)) > > File "/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar > > return list(map(*args)) > > File "/usr/share/ceph/mgr/cephadm/utils.py", line 59, in do_work > > return f(*arg) > > File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 47, in create_from_spec_one > > host, cmd, replace_osd_ids=osd_id_claims.get(host, []), env_vars=env_vars > > File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 56, in create_single_host > > out, err, code = self._run_ceph_volume_command(host, cmd, env_vars=env_vars) > > File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 271, in _run_ceph_volume_command > > error_ok=True) > > File "/usr/share/ceph/mgr/cephadm/module.py", line 1100, in _run_cephadm > > with self._remote_connection(host, addr) as tpl: > > File "/lib64/python3.6/contextlib.py", line 81, in enter > > return next(self.gen) > > File "/usr/share/ceph/mgr/cephadm/module.py", line 1046, in _remote_connection > > raise OrchestratorError(msg) from e > > orchestrator._interface.OrchestratorError: Can't communicate with remote host `ceph1d`, possibly because python3 is not installed there: [Errno 32] Broken pipe > > I checked directly on the nodes and I can execute "python3" command and I can also SSH into all nodes with the following test command: > > ssh -F ssh_config -i ~/cephadm_private_key root@nodeX > > So I don't really understand what could have broken the cephadm orchestrator... Any ideas? The cephfs itself is still working. > > Best regards, > > Mabi _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx