Hello, After having done a rolling reboot of my Octopus 15.2.13 cluster of 8 nodes cephadm does not find python3 on the node and hence I get quite a few of the following warnings: [WRN] CEPHADM_HOST_CHECK_FAILED: 7 hosts fail cephadm check host ceph1f failed check: Can't communicate with remote host `ceph1f`, possibly because python3 is not installed there: [Errno 32] Broken pipe Here is the full stack trace from cephadm: 2021-07-06T06:03:20.798410+0000 mgr.ceph1a.xxqpph [ERR] Failed to apply osd.all-available-devices spec DriveGroupSpec(name=all-available-devices->placement=PlacementSpec(host_pattern='*'), service_id='all-available-devices', service_type='osd', data_devices=DeviceSelection(all=True), osd_id_claims={}, unmanaged=False, filter_logic='AND', preview_only=False): Can't communicate with remote host `ceph1d`, possibly because python3 is not installed there: [Errno 32] Broken pipe Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 1015, in _remote_connection conn, connr = self._get_connection(addr) File "/usr/share/ceph/mgr/cephadm/module.py", line 978, in _get_connection sudo=True if self.ssh_user != 'root' else False) File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 34, in __init__ self.gateway = self._make_gateway(hostname) File "/lib/python3.6/site-packages/remoto/backends/__init__.py", line 44, in _make_gateway self._make_connection_string(hostname) File "/lib/python3.6/site-packages/execnet/multi.py", line 134, in makegateway gw = gateway_bootstrap.bootstrap(io, spec) File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 102, in bootstrap bootstrap_exec(io, spec) File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 46, in bootstrap_exec "serve(io, id='%s-slave')" % spec.id, File "/lib/python3.6/site-packages/execnet/gateway_bootstrap.py", line 78, in sendexec io.write((repr(source) + "\n").encode("ascii")) File "/lib/python3.6/site-packages/execnet/gateway_base.py", line 409, in write self._write(data) BrokenPipeError: [Errno 32] Broken pipe During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 1019, in _remote_connection raise execnet.gateway_bootstrap.HostNotFound(msg) execnet.gateway_bootstrap.HostNotFound: Can't communicate with remote host `ceph1d`, possibly because python3 is not installed there: [Errno 32] Broken pipe The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 412, in _apply_all_services if self._apply_service(spec): File "/usr/share/ceph/mgr/cephadm/serve.py", line 450, in _apply_service self.mgr.osd_service.create_from_spec(cast(DriveGroupSpec, spec)) File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 51, in create_from_spec ret = create_from_spec_one(self.prepare_drivegroup(drive_group)) File "/usr/share/ceph/mgr/cephadm/utils.py", line 65, in forall_hosts_wrapper return CephadmOrchestrator.instance._worker_pool.map(do_work, vals) File "/lib64/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/lib64/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/lib64/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/lib64/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/usr/share/ceph/mgr/cephadm/utils.py", line 59, in do_work return f(*arg) File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 47, in create_from_spec_one host, cmd, replace_osd_ids=osd_id_claims.get(host, []), env_vars=env_vars File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 56, in create_single_host out, err, code = self._run_ceph_volume_command(host, cmd, env_vars=env_vars) File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 271, in _run_ceph_volume_command error_ok=True) File "/usr/share/ceph/mgr/cephadm/module.py", line 1100, in _run_cephadm with self._remote_connection(host, addr) as tpl: File "/lib64/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/usr/share/ceph/mgr/cephadm/module.py", line 1046, in _remote_connection raise OrchestratorError(msg) from e orchestrator._interface.OrchestratorError: Can't communicate with remote host `ceph1d`, possibly because python3 is not installed there: [Errno 32] Broken pipe I checked directly on the nodes and I can execute "python3" command and I can also SSH into all nodes with the following test command: ssh -F ssh_config -i ~/cephadm_private_key root@nodeX So I don't really understand what could have broken the cephadm orchestrator... Any ideas? The cephfs itself is still working. Best regards, Mabi _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx